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(54) Title: METHODS AND COMPaSTTIONS FOR INTERACTION TRAP ASSAYS 

00 
00 

(57) Abstract: The present invention provides methods and compositions for interaction trap assays for detecting protein-protein, 
protein-DNA, or protein-RNA interactions. The methods and compositions of the invention may also be used to identify agents which 
may agonize or antagonize a protein-protein, protcin-DNA, or protcin-RNA interaction. In certain embodiments, the interaction trap 
system of the invention is useful for screening libraries with greater than 10^ members. In other embodiments, the interaction trap 
system of the invention is used in conjunction with flow cytometry. The invention further provides a means for simultaneously 
screening a target protein or nucleic acid sequence for the ability to interact with two or more test proteins or nucleic acids. 
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5 Methods and Compositions for Interaction Trap Assays 

Backerouad of the Invention 

Specific protein-DNA and protein-protein interactions are fundamental to most 
cellular functions. Protein-DNA interactions, for example, form the basis of important 
mechanisms by which the cell activates or represses gene expression and regulates DNA 
10 replication. Polypeptide interactions are involved in, inter alia, formation of functional 
transcription complexes, repression of certain genes, signal transduction patiiways, 
cj^oskeletal organization (e.g., microtubule polymerization), polypeptide hormone 
receptor-ligand binding, organization of multi-subunit enzyme complexes, and the like. 

Investigation of protein-DNA and protein-protein interactions under physiological 
15 conditions has been problematic. Considerable effort has been made to identify proteins 
that bind to proteins of interest. Typically, these interactions have been detected by using 
co-precipitation experiments in which an antibody to a known protein is mixed with a cell 
extract and used to precipitate the known protein and any proteins that are stably associated 
with it. This method has several disadvantages, such as: (1) it only detects proteins which 
20 are associated in cell extract conditions rather than imder physiological, intracellular 
conditions, (2) it only detects proteins which bind to the known protein with sufficient 
strength and stability for efficient co-immunoprecipitation, (3) it may not be able to detect 
oligomers of the target, and (4) it fails to detect associated proteins which are displaced 
from the known protein upon antibody binding. Additionally, precipitation techniques at 
25 best provide a molecular weight as the main identifying characteristic. Similar difficulties 
exist in the analysis of physiologically relevant protein-DNA interactions. For these 
reasons and ofhersi improved methods for identifying proteins that interact with a known 
protein have been developed. 

One approach to these problems has been to use a so-called mteraction trap system 
30 or "ITS" (also referred to as the **two-hybrid assay**) to identify polypeptide sequences 

which bind to a predetermined polypeptide sequence present hi a fusion protein (Fields and 
Song (1989) Nature 340:245). This approach identifies protein-protem interactions in vivo 
through reconstitution of a eukaryotic transcriptional activator. The system has also been 
adapted for studying protein-DNA interactions. 

35 The interaction trap systems of the prior art are based on the finding that most 

eukaryotic transcription activators are modular. Brent and Ptashne showed that the 
activation domain of yeast GAL4, a yeast transcription factor, could be fused to the DNA 
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5 binding domain ofE. coli LexA to create a functional transcription activator in yeast (Brent 
et al. (1985) Cell 43:729-736). There is evidence that transcription can be activated through 
the use of two functional domains of a transcription factor: a domain that recognizes and 
binds to a specific site on the DNA and a domain that is necessary for activation. The 
transcriptional activation domain is thought to function by contacting other proteins 

10 involved in transcription. The DNA-binding domain appears to function to position the 
transcriptional activation domain on the target gene that is to be transcribed. These and 
similar experiments (Keegan et al. (1986) Science 231:699-704) formally define activation 
domains as portions of proteins that activate transcription when brought to DNA by DNA- 
binding domains. Moreover, it was discovered that the DNA binding domain does not have 

15 to be physically on the same polypeptide as the activation domain, so long as the two 
separate polypeptides interact with one another. (Ma et al. (1988) Cell 55:443-446). 

Fields and his coworkers made the seminal suggestion that protein interactions 
could be detected if two potentially interacting proteins were expressed as chimeras. In their 
suggestion, they devised a method based on the properties of the yeast Gal4 protein^ which 

20 consists of separable domains responsible for DNA-binding and transcriptional activation. 
Polynucleotides encoding two hybrid proteins, one consisting of the yeast Gal4 DNA- 
binding domain fused to a polypeptide sequence of a known protein and the other 
consisting of the Gal4 activation domain fused to a polypeptide sequence of a second 
protein, are constructed and introduced into a yeast host cell. Intermolecular binding 

25 between the two fusion proteins reconstitutes the Gal4 DNA-binding domain with the Gal4 
activation domain, which leads to the transcriptional activation of a reporter gene (e.g., 
lacZ, HIS3) which is operably linked to a Gal4 binding site. 

All yeast-based interaction trap systems in the art share common elements (Chi en et 
al. (1991) PNAS 88:9578-82; Durfee et al. (1993) Genes & Development 7:555-69; Gyuris 

30 et al. (1993) Cell 75:791-803; and Vojtek et al. (1993) Cdl 74:205-14). AU use (1) a 

plasmid that directs the synthesis of a *1>aif' : a known protein which is brought to DNA by 
being fiised to a DNA binding domain, (2) one or more reporter genes ("reporters") with 
upstream binding sites for fte bait fusion, and (3) a plasmid that durects the synthesis of 
proteins fused to activation domains and other useful moieties (**prey"). All current systems 

35 direct the synthesis of proteins that carry the activation domain at the amino terminus of the 
fusion, facilitating the expression of open reading frames encoded by, for example, cDNAs. 

Due to an upper Unriit on the transformation efficiency of yeast cells of '-10^, the 
yeast-based one-hybrid and two-hybrid systems are not practical for use in the analysis of 
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5 libraries larger than 1 0^ in size. For the analysis of most cDNA libraries^ the ability to 
cover libraries 10^ to lO' in size is adequate. However, there are a number of situations in 
which the inability to search a library larger than 10^ in size is problematic. One example is 
the challenge of searching libraries containing randomized sequences. For example, a 
strategy for randomizing at just six different residues in a test polypeptide can produce a 

10 library of variants which exceeds the practical use of the yeast interaction trap systems. To 
illustrate, if one employs a strategy using 24 different codons (encoding 19 different amino 
acids) at each of the six positions, the resulting library will have a potential DNA sequence 
space of 24^ or --2x10^ and an amino acid sequence space of 19^ or -5 x 10^. To ensure 
nearly complete coverage of such a library, one needs to oversample by a factor of at least 

15 three-fold (i.e. — one must sanq>le 3x2x10^ candidates). The diflBculty with library size 
becomes exponentially more problematic with each additional residue that is randomized. 

Another approach used to study protein-DNA and protein-protein interactions is the 
method of phage display. In this system, proteins are displayed on the surface of 
filamentous bacteriophage (e.g. — ^M13) that harbor the DNA encoding the displayed 

20 protem. Target proteins or DNA sequences of interest are immobilized on a solid support 
(typically plates or beads) and used to affinity-enrich libraries of phage-displayed proteins 
for candidates that bind to the target. Because these phage hbraries are constructed in E. 
coli, this system can create libraries larger than 10^ (and as large as 10^') in size. This 
method has been used successfully to identify and characterize both protem-DNA and 

25 protein-protein interactions. See, for example, Allen et al. (1995) Trends Biol. Sci. 20: 511- 
516; Phizicky et al (1995) Microbiol Rev. 59:94-123; Rebar et al. (1996) Mthds. Enzvmol 
267:129-149; and Smith et al, (1997) Chem. Rev. 97:391-410. However, phage display 
does have certain significant limitations. Unlike direct, single-stq> selection methods (e.g.- 
the yeast one- and two-hybrid systems), phage display is an enrichment process that 

30 requires multiple cycles to obtain desired candidates fix>m a library. In addition, phage 

display enrichments are performed in vitro (and not in vivo as in yeast one- and two-hybrid 
methods). Finally, because proteins must be exported to the bactmal cell membrane in 
order to be displayed on the phage svu*face, certain proteins (particularly larger ones) are not 
well suited for analysis by phage display. This last limitation can be particularly significant 

35 if this biological phenomenon artifactuaUy removes certain candidates &om a hbrary. 

More recently, a prokaryote-based interaction trap assay has been developed. See, 
for example, U.S. Patent No. 5,925,523. The prokaryotic ITS derives in part fi:om the 
unexpected finding that the natural interaction betvtreen a transcriptional activator and 
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5 subunit(s) of an RNA polymerase complex can be replaced by a heterologous protein- 
protein interaction which is capable of activating transcription. Because bacteria (E. coli in 
particular) have a much higher relative transformation efficiency (typically 10^ or greater) 
than yeast, the description of prokaryotic-based one- and two-hybrid systems would appear 
to address the Ubrary size restrictions of the yeast systems. However, although higher 

10 transformation efficiencies are possible in E. coli, a significant deficiency of the prior art is 
that it does not make clear which, if any, reporter gene(s) have the characteristics required 
for use in the analysis of libraries larger than 10^ in size. Desirable reporter genes should 
have one or more of the following characteristics: 1) The reporter gene should readily 
facilitate the rapid analysis of very large numbers of candidates. Thus, reporter genes 

15 (e.g. — ^fhe lacZ gene encoding beta-galactosidase) that must be screened by a visual colony 
phenotype (e.g. — color) are not useful because no more than 10'* to lO'^ colonies can be 
screened on a single agar plate and it is not practical to manually plate and assess 10^ or 
more plates for each experiment. 2) The reporter gene system must be sufficiently 
stringent or selective so that spurious, randomly arising background mutations do not 

20 complicate the analysis. For example, a selection based on expression of the 

spectinomycin resistance gene (aadA) would not be suitable for the analysis of large 
libraries because randomly occurring mutations that result in spectinomycin resistance arise 
at a frequency of approxunately 10"^ to 10'^ (Sera and Schultz, PNAS, 93: 2920-2925 
(1996); Huang et al., PNAS, 91 : 3969-3973 (1994)). Thus, if one were to examine a library 

25 of 10^ members using the aadA system, one should expect to receive 10^ or more false 
positives due solely to spontaneous spectinomycin resistance. This can pose a significant 
problem particularly if true positives occur with low frequency in the 10* member library. 
3) Expression of the reporter gene should be quantifiable and should easily facilitate 
the selection of candidates based on any specific criteria. For example, an ideal reporter 

30 system would allow one to isolate library members that meet specific quantitative cutof& 
(e.g. expression of reporter >50 or <50) and/or windows (e.g. expression of reporter >25 
AND <75, or <25 OR >75). 

There are at least two additional deficiencies in the prior art describmg the 
prokaryotic ITS: 

35 A) The ability to simultaneously monitor the expression of multiple reporter 

genes in a single cell. U.S. Patent No. 5,925,523 and 5,580,736 and others (PCT 
appUcations WO 99/14319; WO 99/28745; WO 99/31509 and WO 99/28744; and Grossle 
et al.. Nature Biotechnology 17: 1232-1233 (1999) have noted the usefuhiess of havmg the 
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5 interaction between the bait and prey constructs activate more than one reporter gene in a 
single cell to reduce the occurrence of false positives. Additionally, Grossle et al., Nature 
Biotechnology 17: 1232-1233 (1999) and Serebriiskii et al., J. Biol Cham. 274: 17,080- 
17,087 (1999) demonstrate a "dual bait" version of the yeast two hybrid system capable of 
monitoring the interaction of two different bait proteins with a single prey protein. This 

10 system can be used to screen for cells which have a desired combination of interactions 
between a sinlge prey protein and two bait proteins by utihzing a combmation of growth 
selection screens and visual lacZ screens. However, in contrast to the present invention, 
those references do not teach or suggest simultaneous and independent monitoring of the 
expression of multiple reporter genes in a single cell where the expression of each reporter 

1 5 gene is regulated by the interaction of a single protein of interest with different partners. 
For example, one may wish to select a protein (from a large library) that interacts with 
Target Protein A but does NOT interact with Target Protein B. In this case, if the system 
was set up such that binding of the interactor protein with Target Protein A increased the 
expression of Reporter Gene A and the binding of the interactor protein with Target Protein 

20 A increased the expression of Reporter Gene B, we would want to select those cells that 
had very high expression of Reporter Gene A AND very low expression of Reporter Gene 
B. Selections of this type (based on the strengths of m\iltiple interactions) would also be 
especially useful for selecting very specific DNA-binding proteins that bind well to the 
desired target site but do NOT bind well to even closely related sites. We note that U.S. 

25 Patent No. 5,925,523 does not teach how one could easily monitor multiple reporters in a 
single cell and that, to our knowledge, no reference describes how to simultaneously 
monitor the differential expression of multiple reporters in a single cell. 

B) Methods for practicing library vs library screening. With the wealth of 
genomic infonnation currently becoming available, a number of groups have begun to 

30 address the challenges in library vs. library screening of large collections of coding 
sequences. Ideally, a method for performing such a comprehensive library vs. library 
search shoidd: 1) provide an efficient method for crossing two large libraries and 2) be 
amenable to partial or complete automation. The use of transformation as a method to 
effect the simultaneous (or sequential) introduction of two libraries into either yeast or 

35 bacterial cells fails to meet either of these criteria. Even in bacteria where very high 

transformation efficiencies are possible, examination of 10^ combinations would only allow 
one to examine two libraries each comprised of only 33,000 candidates. In addition, since 
transformation requires pre-treatment of cells (e.g. — ^washing and resuspension in divalent 
cation solutions) and multiple protocol steps (e.g. — heat shock, addition of medium. 
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5 recovery), it is not easily adaptable for automation. For library vs. Ubrary experiments 
conducted in yeast, investigators have e3q)loited the fact that yeast can exist as one of two 
sexes (a and a) in haploid form. Mating of a and a cells leads to the formation of a diploid 
a/a cell harboring the DNA from both the starting haploid cells. Thus, a cells harboring a 
Ubrary of prey hybrids can be easily mated with a cells harboring a test bait hybrid(s) 

10 simply by mixing the cells together and selecting for diploid cells. In this way, a large 
number of combinations can be simply and rapidly tested, bypassing the need for labor- 
intensive transformation experiments when crossing the hbraries. See Uetz et al. (2000) 
Nature 403:623-627 and Walhout et al. (2000) Science 287:116-122. Prokaryotes (and E. 
coU in particular) replicate asexually, and U.S. Patent No. 5,925,523 and the existing 

15 literature do not teach how to perform analogous library mating experiments m the 
prokaryotic ITS. 

It is an object of the present invention to describe the following improvements to the 
rrS: 1) reporter genes (and methods for detecting their expression) that readily permit the 
analysis of large libraries (>10^ in size) and whose selectivity can be easily '*tuned," 

20 modified, and/or monitored, 2) methods for the simultaneous and independent measurement 
of multiple interactions (as judged by expression of different reporter genes), and 3) 
construction of libraries using a phagemid-based system that provides a) an efficient, 
automatable method for performing library vs. Ubrary experiments and b) a method to 
simpHfy the analysis of positive candidates from ANY screen/selection performed in the 

25 prokaryotic ITS. 

Summary of the Invention 

The present invention relates to methods and reagents for identifying, analyzing, 
modifjdng, and/or optimizing the affinity and/or spedficity of protein-DNA and protein- 
30 protein interactions (collectively, **interacting pairs") in cell-based systems. 

In certain aspects, the subject invention provides an interaction trap assay for 
selecting interacting pairs from large libraries of potential interactors, e.g., greater than 10^ 
in size (diversity) and more preferably greater than 10^, lO^lO'^, or 10^^ in size. In one 
embodiment, we have discovered that the use of reporter genes which confer selective 
35 growth traits, rather than reporters which encode photometrically active labels or otherwise 
require visual inspection for detection, allows the use of libraries large enough to 
significantly improve the chance of finding interacting partners, i.e., &om hbraries in the 
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5 range of 10^-10^^ members. lo other embodiments, the use of flow cytometry for 

qnantitating reporter gene expression permits the screening of large libraries, i.e., in the 
range of 10^ to 10^^ members and allows one to simultaneously and independently assess in 
a single cell the affinity and/or specificity of any given interaction being tested. When 
designing or optimizing interactions, additional rounds of 1) mutatgenesis, and 2) selection 

10 or sorting can be used to further optimize interactions. 

In certain preferred embodiments, the subject method is used to identify or optimize 
protein-DNA interactions. For example, the subject method can be used to identify mutant 
or composite DNA binding domains having desired sequence binding preferences. It can 
also be used to identify DNA sequences which are selectively bound by a given DNA 
15 binding protein and/or to determine the sequence specificity of a DNA binding protein. In 
some cases, the method may allow simultaneous variation of both 1) the target site and 2) 
the binding protein to find pairs that work well together. 

For example, the method can be used to identify protein-DNA interactions by 
providing a host cell which contains a reporter gene encoding a growth selective marker, . 

20 operably linked to a target DNA sequence. The cell is also engineered to include a first 
chimeric gene which encodes a first fusion protein including (a) a first interacting domain, 
and (b) a test DNA binding domain. The cell also expresses a second chimeric gene 
encoding a second fusion protein including (a) a second interacting domain that binds to the 
first interacting domain, and (b) an activation tag (such as a polymerase interaction domain) 

25 which activates transcription of the selective marker gene when localized in the vicinity of 
the target DNA sequence. One or both of the test DNA binding domains and/or the target 
DNA sequence are provided in the host cell populations as variegated libraries (with respect 
to sequence) to yield a library complexity of at least 10^ members. Cells in which 
interaction of a test DNA binding domain and a target DNA sequence occur can be selected 

30 and/or amphfied based on the resulting favorable growth trait conferred by the growth 
selective marker. 

For example, certain embodiments relate to a method for detectmg an interaction 
between a first test polypeptide and a second test polypeptide. The method comprises a 
step of providing an interaction trap system including a host cell which contains one or 
35 more reporter genes operably linked to transcriptional regulatory sequences which include 
one or more binding sites ("DBD recognition element) for a DNA-binding domain. The 
reporter encodes a growth selection marker (defined mjfra) . The cell is engineered to 
include a first chimeric gene which encodes a first fusion protein (the **baif * protein), the 
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5 first fusion protein including a DNA-binding domain and first test polypeptide. The cell 
also includes a second chimeric gene which encodes a second fusion protein (the '*prey*' 
protein) including an activation tag (such as a polymerase interaction domain (PID) in the 
prokaryotic embodiments) which activates transcription of the reporter gene when localized 
to the vicinity of the DBD recognition element. Interaction of the first fusion protein and 

10 second fusion protein in the host cell results in a growth advantage which permits the 
isolation of cells including the interacting pair. Either or both of the first and second test 
polypeptides can be provided as part of a variegated library of coding sequences. 

In other embodiments, the subject method can be used to detect the interactions 
between a potential DNA bindmg domain and a nucleic acid. The format described above 

15 for detecting protein-protein interactions can be readily modified as follows: the first and 
second test polypeptide portions of the bait and prey proteins are chosen fi'om known 
interacting pairs, and one or both of the DNA binding domains and DBD recognition 
element(s) are provided as part of a variegated library of coding sequences or potential 
recognition sequences. Thus the system can be used to obtain: 1) DNA binding domains 

20 fliat recognize a desired target site; 2) functional binding sites for a given DNA-binding 
domain; or 3) sets of functionally interacting proteins and target sites. Alternatively, when 
analyzing protein-DNA interactions, the DNA binding domain can be fused directly to the 
activation tag, e.g., to consoUdate the bait and prey protein functions of DNA interaction 
and transcriptional activation, into a single protein. In a preferred embodiment, the reporter 

25 gene is selected on the basis of its abihty to provide a strigency to the detection/isolation 
step which reduces the occurrence rate of breakthroug^h false positives to less than 1:10^, 
and even more preferably less than 1 :10^ 1 :10^ or even 1:10^^ 

Another aspect of the present invention provides methods and reagents for 
practicing various forms of mteraction trap assays using flow cytometry, preferably as a 
30 high throughput means (supra) , for detecting and isolating genes encoding interacting 
proteins or desired DNA binding domains. The subject **flow ITS" can be used, for 
example, to screen libraries of potential protein-protein or protein-nucleic acid interactions. 

For example, certain embodiments relate to a method for detecting interaction 
between a first test polypeptide and a second test polypeptide. The method comprises a 
35 step of providing an interaction trap system including a host cell which contains one or 
more reporter genes operably linked to transcriptional regulatory sequences which include 
one or more binding sites ('TDBD recognition element") for a DNA-binding domam. The 
reporter encodes a FACS tag polypeptide (defined infra) . The cell is engineered to include 
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5 a first chimeric gene which encodes a jBrst fusion protein (the "bait'* protein), the first 
fiision protein including a DNA-binding domain and first test polypeptide. The cell also 
includes a second chimeric gene which encodes a second fiision protein (the "prey" protein) 
including an activation tag (such as a polymerase interaction domain (PID) in the 
prokaryotic embodiments) which activates transcription of the reporter gene when localized 

10 to the vicinity of the DBD recognition element. Interaction of the first fiision protein and 
second fiision protein in the host cell resxilts in measurably greater expression of the FACS 
tag polypeptide. Either or both of the first and second test polypeptides can be provided as 
part of a variegated library of coding sequences. Accordingly, the method also includes the 
steps of isolating cells expressing the FACS tag polypeptide by fluorescence activated cell 

15 sortmg techniques. 

In certain embodiments, the present invention provides a kit for detecting interaction 
between a first test polypeptide and a second test polypeptide, or between a DNA binding 
domain and a DBD recognition sequence. 

In one version of this embodiment, the kit can include a first vector for encoding a 

20 first fiision protein ("bait fiision protein"), which vector comprises a first gene including (1) 
transcriptional and translational elements which direct expression in a host cell, (2) a DNA 
sequence that encodes a DNA-binding domain and which is fimctionally associated with the 
transcriptional and translational elements of the first gene, and (3) a means for inserting a 
DNA sequence encoding a first test polypeptide into the first vector in such a manner that 

25 the first test polypeptide is capable of being expressed in-fi-ame as part of a bait fiision 
protein containing the DNA binding domain. The kit wdU also include a second vector for 
encoding a second fiision protein ("prey fusion protein"), which comprises a second gene 
including (1) transcriptional and translational elements which direct expression in a host 
cell, (2) a DNA sequence that encodes an activation tag, such as a polymerase interaction 

30 domain (PID), the activation tag DNA sequence being functionally associated with the 
transcriptional and translational elements of the second gene, and (3) a means for inserting 
a DNA sequence encoding the second test polypeptide into the second vector in such a 
manner that the second test polypeptide is capable of being expressed in-firame as part of a 
prey fiision protein containing the polymerase interaction domain. Additionally^ the kit will 

35 include a prokaryotic host cell containing a reporter gene having a binding site CTDBD 

recognition element") for the DNA-binding domain, wherein the reporter gene expresses a 
FACS tag polypeptide or a growth selection marker (as defined herein) when a prey fiision 
protein interacts with a bait fiision protein bound to the DBD recognition element. 
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5 In another version, the kit can include a first vector for encoding the bait fusion 

protein, wherein flie bait fusion gene includes (1) transcriptional and translational elements 
which direct expression in a host cell, (2) a DNA sequence that encodes a polypeptide (an 
"interacting domain") havmg a known interacting partner, and (3) a means for inserting a 
DNA sequence encoding a potential DNA-binding domain into the first vector in such a 

10 manner that the potential DNA-binding domain is expressed in-firame as part of a bait 

fusion protein containing the interacting domain. In certain embodiments, the kit will also 
include a second vector for encoding the prey fusion protein, which comprises a second 
gene including (1) transcriptional and translational elements which direct expression in a 
host cell, (2) a DNA sequence that encodes an activation tag, and (3) a coding sequence for 

1 5 a polypeptide which binds the interacting domain of the bait protein. However, in other 
embodiments (as when studying protein-DNA interactions), the interacting domain of the 
bait protein can be the activation tag, e.g,, avoiding the need to generate the prey protein. 
Additionally, the kit will include a prokaryotic host cell containing one or more reporter 
genes having binding sites CDBD recognition elements") for which binding or selectivity 

20 in binding by the potential DNA-binding domain of the bait protein is sought. The host cell 
population, in certain instances, can provide a library of reporter gene constructs wherein 
the DBD recognition element of a reporter gene is variegated to produce a library of 
potential recognition elements against which the bait protein binding is to be assessed. At 
least one of the reporter genes expresses a FACS tag polypeptide or a growth selection 

25 marker (as defined herein) when a prey fusion protein interacts witib a bait fusion protein 
bound to the DBD recognition element. 

In certain embodiments, the subject flow ITS can be carried out using a host 
engineered with two or more differrat reporter genes constructs encoding different FACS 
tag polypeptides which can be independently and simultaneously measured. In certain 

30 preferred embodiments, the transcriptional regulatory elements, and specifically the DBD 
recognition elements, of at least two of the reporter gene constructs are different. In such 
embodiments, DNA binding domains can be identified which selectively bind only a subset 
of the DBD recognition elements of the reporter gene constructs. The various reporter gene 
constructs can be provided on the same or separate vectors. The simultaneous expression 

35 of the various reporter genes (whether provided on the same or separate plasmids) provides 
a means for distinguishing actual interaction of the bait and prey proteins firom, e.g., 
mutations or other spurious activation of the reporter gene, as well as to examine the 
specificity of interaction between the interacting pair. In certain embodiments in which the 
subject flow-ITS is being used to identify a DNA binding domain (as described in further 
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5 detail below), multiple reporter gene constructs can be used in order to permit isolation of 
domains with selective binding activity. For example, the ITS host cell can include one or 
more reporter genes having transcriptional regulatory sequences for which a DNA binding 
domain is sought. At the same time, the cells can also include one or more reporter genes, 
encoding different FACS markers than above (see below), under the control of 

10 transcriptional regulatory sequences for which the DBD being sought does not bind to or 
activate expression. Thus, cells harboring desired candidates can be sorted on the basis of 
differential expression of the multiple classes of reporter genes. Differential protein-protein 
interactions could also be distinguished in this way if: 1) the DNA-binding domain of one 
fusion directs it to a particular promoter, and 2) the DNA-binding domain of the second 

15 fusion directs it to another promoter, but 3) these two proteins have different versions of the 
^'interacting partner'* and one wishes to 4) isolate proteins that recognize one interacting 
partner preferable to another. Similar methods could be used for cell-based selections in 
yeast cells and mammalian cells. 

The interaction trap assays of the present invention can be used, inter alia, for 
20 identifying protem-protein and/or protein-DNA interactions, e.g., for generating protein 
linkage maps, for identifying therapeutic targets, and/or for general cloning strategies. 

The ability to test very large libraries using one or more of the selection/screening 
methods described in this application permits not only the analysis of large scale library- 
versus-smgle bait or DNA target sequence experiments, but also large-scale library-versus- 

25 library experiments. Another aspect of the present invention describes a method for 

constructing protein-encoding libraries that can be introduced into bacterial cells without 
the need for transfomiation. Members of this library can then be "rescued'* from bacterial 
cells without the need to perform labor-intensive plasmid extraction, then mtroduced into 
bacterial cells again without the need for transformation. This method is particularly useful 

30 for library vs. Ubraiy screening/selection experiments, for directed or continuous evolution 
strategies, for serial selection protocols designed to reduce background false positives, and 
for automating the processing and re-testing of positive candidates from a screen/selection. 

In still other embodiments, the ITS can be designed for the isolation of genes 
encoding proteins viiich physically interact with a protein/drug or DNA/drug complex. The 
35 method relies on detecting the reconstitution of a transcriptional activator in the presence of 
the drug, such as rapamycin, FK506 or cyclosporin. Li the protein-protein fomiat, if the bait 
and prey fusion proteins are able to interact in a drug-dependent manner, the interaction 
may be detected by reporter gene expression. In the DNA-protein format, if the bait and 



wo 01/88197 



PCTAJSOl/15718 



-12- 

5 DBD recognition sequence of the reporter gene are able to interact in a drug-dependent 
manner, the interaction may be detected by reporter gene expression. 

Yet another aspect of the present invention relates to the use of the subject ITS 
fonnats in the development of assays which can be used to screen for drugs which are either 
agonists or antagonists of a protein-protein or protein-DNA interaction of therapeutic 

10 consequence, hi a general sense, the assay evaluates the ability of a compound to modulate 
binding between a bait protein and either a prey protein or a DBD recognition sequence, as 
the case may be. Exemplary compounds which can be screened include peptides, nucleic 
acids, carbohydrates, small organic molecules, and natural product extract libraries, such as 
isolated from animals, plants, fungus and/or microbes. The method may also be used to 

15 screen for compounds that regulate folding, processing, or activation of relevant proteins 
(e.g. by regulating phosphorylation, ubiquitination, proteolytic processing or other post- 
translational modification). 

In many drug screening programs which test libraries of conipounds and natural 
extracts, high throughput assays are desirable in order to maximize the number of 

20 compounds surveyed in a given period of time. The subject ITS-derived screening assays 
can be carried out in such a format, and accordingly may be used as a *^rimary" screen. 
Accordingly, in an exemplary screening assay of the present invention, an ITS is generated 
to include specific bait and prey pairs or bait and DBD recognition element pairs known to 
interact, and compound(s) of interest. Detection and quantification of reporter gene 

25 expression provides a means for determining a compound's efficacy at inhibiting (or 
potentiating) interaction between the interacting pairs. In certain embodiments, the 
approximate efficacy of the compound can be assessed by generating dose response curves 
from reporter gene expression data obtained using various concentrations of the test 
compound. 

30 In order to make the cells permeable to certain small molecule compounds, it may 

be necessary alter the medimn in which cells grow or to introduce mutations that affect the 
pemeability of the cell membrane (see, for example, Vaara (1992) Microbiol Rev. 56: 395- 
411; Sampson et al. (1989) Genetics 122: 491-501). For example, Vaara describes the use 
of various polycations and chelators for increasing the outer membrane permeability of 

35 gram negative bacteria. Sampson et al. describes the construction of an increased 

membrane permeability (imp) strain of E. coli which contains a mutation causing increased 
penneability of the outer membrane. 
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Particular aspects and embodiments of the invention are described in more detail 

below. 

In a first aspect, the invention features a method for selecting an mteracting pair of 
test polypeptides, comprising: 

i providing a population of prokaryotic host cells wherein each host cell 
contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain^ 

(b) a first chimeric gene which encodes a first fiision protein, the first 
fiision protein including a DNA-binding domain and a first test 
polypeptide, 

(c) a second chimeric gene which encodes a second fusion protein, the 
second fusion protein including an activation tag and second test 
polypeptide, 

wherein the first fusion protein is part of a library of at least 10^ members, the 
second fiision protem is part of a library of at least 10^ members^ or the first and second 
fusion proteins are both members of a hbrary such that at least 10^ unique pairs of test 
polypeptides could be tested for interaction; 

wherein interaction of a first fiision protein and a second fiision protein in a host cell 
results in a desired level of expression of the reporter gene; 

wherein the desired level of expression of the reporter gene confers a growth 
advantage on the host cell; and 

ii isolating host cells with a growth advantage wherein said ceUs comprise a 
first fiision protein and a second fiision protein which interact thereby selecting an 
interacting pair of test polypeptides. 
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5 In certain embodiments, the method further comprises the step of identifying nucleic 

acids which encode test polypeptides which cause the desired level of expression of the 
reporter gene, 

In other embodiments, selective growth conditions are applied to the host cells. 

In various embodiments, the desired level of expression of the reporter gene may be 
10 an increase, a decrease, or no change in the level of expression of the reporter gene as 
compared to the basal expression level of the reporter gene. 

In other embodiments, the transcriptional regulatory sequence includes at least two, 
at least three, at least four, or at least five binding sites for a DNA-binding domain. 

In various embodiments, the reporter gene encodes a gene product that gives rise to 
15 a detectable signal selected from the group consisting of cell viability, relief of a cell 
nutritional requirement, cell growth and drug resistance. 

In another embodiment, the degree of the growth advantage conferred by the desired 
level of expression of the reporter gene is controllable by varying the growfli conditions of 
the host cell. In first particular embodiment, the reporter gene is the yeast His3 gene and 

20 the degree of the growth advantage is controllable by exposing the host cell to varying 
concentrations of 3-aminotriazole. hi a second particular embodiment, the reporter gene is 
a p-lactamase gene and the degree of the growth advantage is controllable by exposing the 
host cell to a P-lactam antibiotic or to a p-lactam antibiotic and a p-lactamase inhibitor. 
Examples of p-lactamase genes which maybe used in accord with the invention include 

25 TEM-1, TEM-2, OXA-1, OXA-2, OXA-3, SHV-1, PSE-1, PSE-2, PSE-3, PSE.4 and CTX- 
1, and functional fragments thereof. Examples of P-lactam antibiotics which may be used 
in accord with the invention include penicillins, cephalosporins, monbactams and 
carbapenems. Examples of p-lactamase inhibitors which may be used in accord with the 
invention include Clavulanic acid, sulbactam, tazobactam, brobactam and P-lactamase 

30 inhibitory protein (BLIP). The P-lactam antibiotics and P-lactamase inhibitors are 

generally added to the growth medium of the host cells, however, in the case of BLIP, the 
inhibitory protein may be expressed wiOun the cell in addition to being added to the growth 
medium. 
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5 In certain embodiments, the activation tag is an RNA polymerase, an RNA 

polymerase subunit, a functional fragment of an RNA polymerase, or a functional fragment 
of an RNA polymerase subunit. In other embodunents, the activation tag is a polypeptide, a 
nucleic acid, or a small molecule, and wherein the activation tag binds RNA polymerase, an 
RNA polymerase subunit, a fimctional fragment of an RNA polymerase, or a functional 

10 fragment of an RNA polymerase subunit. In still other embodiments, the activation tag 
interacts indirectly with RNA polymerase via at least one intermediary polypeptide, nucleic 
acid, or small molecule, which binds to the activation tag and to RNA polymerase. In a 
particular embodiment, the activation tag is a fragment of Gal 1 IP, and wherein the 
activation tag interacts with a fusion between Gal4 and the a subunit of RNA polymerase. 

15 In a further embodiment, the prokaryotic host cell further contains a second reporter 

gene such that interaction of a jSrst fusion protein and a second fusion protein in a host cell 
results in a desired level of expression of the second reporter gene. The desired level of 
expression of the second rq)orter gene may be an increase, decrease, or no change in the 
level of expression of the second reporter gene as compared to the basal transcription level 

20 of the second reporter gene. 

In certain embodiments, host cells are isolated wherein the desired level of 
expression of the first and second reporter genes is an increase in the expression level of the 
reporter genes as compared to the basal expression level of the reporter genes. In another 
embodiment, host cells are isolated wherein the desired level of expression of the first 

25 reporter gene is an increase fax the expression level of the first reporter gene as compared to 
the basal transcription level of the first reporter gene, and tiie desired level of expression of 
the second reporter gene is a smaller increase in the expression level of the second reporter 
gene as compared to the basal transcription level of the second reporter gene relative to the 
increase in expression of the first reporter gene. In additional embodiments, host cells may 

30 be isolated based on a decrease in the level of expression of one of the reporter genes as 
compared to the basal transcription level of the reporter gene. 

In another embodiment, the first and second reporter genes are operably linked to 
the same transcriptional regulatory sequence. Alternatively, the first and second reporter 
genes may be operably linked to separate copies of the same transcriptional regulatory 
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5 sequence. Further, the first and second reporter genes may be operably linked to different 
transcriptional regulatory sequences. 

In various embodiments, the second reporter gene encodes a gene product that gives 
rise to a detectable signal selected fi-om the group consisting of color, fluorescence, 
luminescence, a cell surface tag, cell viability, relief of a cell nutritional requirement, cell 
1 0 growth and drug resistance. 

In another embodiment, the second reporter gene confers a growth advantage under 
selective conditions different from the conditions used for the first reporter gene. 

In various embodiments, the host cells containing a first and second fusion protein 
capable of interacting are isolated by: 

15 i selecting a first population of host cells with a desired expression level of the 

first reporter gene followed by selecting a second population of host cells from the first 
population of host cells based on a desired expression level of the second reporter gene; 

ii selecting a first population of host cells with a desired expression level of the 
second reporter gene followed by selecting a second population of host cells from the first 

20 population of host cells based on a desired expression level of the first reporter gene; or 

iii selecting a population of host cells based on simultaneous selection of 
desired expression levels of the first and second reporter genes. 

In a particular embodiment, the second reporter gene is the lacZ gene. 

In another embodiment, the second reporter gene encodes a fluorescent protein. 
25 Examples of fluorescent proteins which may be used in accord with the invention include 
green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), Renilla 
Renifonnis green fluorescent protein, GFPmut2, GFPuv4, enhanced yellow fluorescent 
protein (EYFP), enhanced cyan fluorescent protein (ECFP), enhanced blue fluorescent 
protein (EBFP), citrine and red fluorescent protein firom discosoma (dsRED). 

30 In a further embodiment, the second rq)orter gene encodes a protein which is 

expressed on the surface of the host cell. 
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5 In certain embodiments wherein the second reporter gene encodes a protein 

expressed on the surfece of the host cell, the method may fiirther comprising the steps of: 

iii contacting the host cell with a fluorescently labeled antibody speciJBc for the 
protein encoded by the second reporter gene thereby labeling the host cell; and 

iv isolating the cells expressing the second reporter gene using FACS analysis; 

10 wherein steps iii and iv may occur before, after, or concurrently with step ii. 

In other embodiments wherein the second reporter gene encodes a protein expressed 
on the surface of the host cell, the method may further comprising the steps of: 

iii isolating host cells expressing the protein encoded by the second reporter 
gene using affinity chromatography (e.g., using a solid support, magnetic particles, etc.), 

15 wherein isolation of the host cells based on expression of the second reporter gme 

may occur before or after isolation of the host cells based on a desired level of expression 
of the first reporter gene. 

In various particular embodiments, the first reporter gene is selected firom the group 
consisting of the yeast His3 gene and a P-lactamase gene and the second reporter gene is 
20 selected from the group consisting of the lacZ gene, a fluorescent protein, a protein which is 
expressed on the surface of the host cell and the bacterial aadA gene. 

In one embodiment, the first and second fiision proteins are expressed from the 
same nucleic acid construct. In another embodiment, the first and second fusion protems 
are expressed from separate nucleic acid constructs. 

25 In a fiulher embodiment, the expression level of the first, second, or first and second 

fusion proteins can be controlled by varying the growth conditions of the host cell. For 
example, in a particular embodiment, the expression level of the first and second fusion 
proteins can be controlled by varying the concentration of IPTG, anhydrotetracycline, or 
IPTG and anhydrotetracycline to which the host cell is exposed. In another embodiment, 

30 the first, second, or first and second fusion proteins are expressed firom a promoter 

comprising a binding site for the lac repressor or the tet repressor. In certain embodiments, 
the expression level of the first and second fusion protein can be independently controUed. 
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5 In one embodiment, the first fusion protein is part of a library of at least 1 0^ 

members, the second fusion protein is part of a library of at least 1 0^ members, or the first 
and second fusion proteins are both members of a library such that at least 10* unique pairs 
of test polypeptides could be tested for interaction. 

In another embodiment, the first fusion protein is part of a library of at least 10^ 
10 members, the second fusion protein is part of a library of at least 10^ members, or the first 
and second fusion proteins are both members of a library such that at least 10^ imique pairs 
of test polypeptides could be tested for interaction. 

In yet another embodiment, the first fusion protein is part of a library of at least 10^^ 
members, the second fusion protein is part of a library of at least 10^^ members, or ttie first 
1 5 and second fusion proteins are both members of a library such that at least lO'*^ unique pairs 
of test polypeptides could be tested for interaction. 

In a further embodiment, the first fiision protein is part of a library of at least 10^^ 
members, the second fusion protein is part of a library of at least 10^ ^ members, or the first 
and second fusion proteins are both members of a library such that at least 10*' unique pairs 
20 of test polypeptides could be tested for interaction. 

In certain embodiments, the prokaryotic host cell is selected firom the group 
consisting of bacterial strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, 
Salmonella, Serratia, Streptococcus, Lactobacillus, Enterococcus and shigella. 

In another embodimmt, the reporter gene construct and/or the chimeric gene 
25 constructs may contained within a vector for introduction into the host cell. In particular 
embodiments, the vector may be a plasmid or a phagemid. Phagemid vectors are generally 
used in conjuction with a host cell that expresses a functional F pilus. Particular examples 
of phagemids which may be used in accord with the invention include pBluescriptnSK+ or 
pBR-GP-Z12BbsI, or derivatives or precursors thereof. When a phagemid vector is bemg 
30 utilized, the phagemid may be introduced into the host cell by infection of the host cell with 
infectious phage containing the phagemid vector in combination with a helper filamentous 
phage. Examples of helper filamentous phage which may be used in accord with the 
invention include Ml 3K07, VCS-M13, M13, and fl, and derivatives thereof. 
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5 In another aspect, the invention features a method for identifying agents which 

modulate a protein-protein interaction, comprising: 

i providing a population of prokaryotic host cells wherein each host cell 
contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 

10 sequence which includes one or more binding sites (DBD recognition 

elements) for a DNA-binding domain, 

(b) a jSrst chimeric gene which encodes a first fusion protein, the first 
fusion protein including a DNA-binding domain and a first test 
polypeptide, 

15 (c) a second chimeric gene which encodes a second fusion protein, the 

second fusion protein including an activation tag and second test 
polypeptide, 

. wherein the prokaryotic host cell is an imp' or gram positive strain of bacteria; 

wherein interaction of a first fusion protein and a second fiision protein in a host cell 
20 results in a desired level of expression of the reporter gene; 

ii contacting the host cell with at least one test agent; and 

iii identifying test agents which modulate expression of the rq>orter gene in a 
manner also dependent on the expression of the first and second test polypeptides, 
thereby identifying agents which modulate a protein-protein interaction. 

25 In various embodiments, the reporter gene encodes a gene product that gives rise to 

a detectable signal selected from the group consisting of color, fluorescence, luminescence, 
a cell surface tag, cell viability, relief of a cell nutritional requirement, cell growth and drug 
resistance. 

In another embodiment, the method further comprises comparing the level of 
30 expression of the reporter gene to a level of expression in a control experiment wherein one 
or both of the test polypeptides are absent or altered so as to preclude interaction of the first 
and second fusion proteins. 
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Tn other embodiments, the test agent is selected from the group consisting of 
peptides, nucleic acids, carbohydrates, natural product extract libraries;, and small organic 
molecules. Additionally, the test agent may be part of a library of test agents. In a 
particular embodiment, the library of test agents has at least 10^, 10^ 10^ 10^^ or 10^^ 
members. 

In further embodiments, test agents are identified which agonize or antagonize the 
protein-protein interaction based on a change in the expression level of the reporter gene in 
the presence of the test agent. 

In certain embodhnents, the host cells may be grown under conditions which 
increase the permeability of the cell membrane. 

In another aspect, the invention features a method for selecting a polypeptide which 
differentially interacts with at least two different test polypeptides, comprising: 

i providing a population of prokaryotic host cells wherein each cell contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a first DNA-binding domain, 

(b) a second reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a second DNA-binding domain, 

(c) a first chimeric gene which encodes a first fusion protein, the first 
fusion protein including a first DNA-binding domain and a first test 
polypeptide, 

(d) a second chimeric gene which encodes a second fusion protein, the 
second fusion protein including a second DNA-binding domain and a 
second t^t polypeptide, 

(e) a third chimeric gene which encodes a third fusion proteio, the third 
fusion protein including an activation tag and third test polypeptide, 

wherein the third fusion protein is part of a library of at least 10^ members; 
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5 wherein interaction of the first fusion protein and the tiiird fusion protein in the host 

cell results in a desired level of expression of the first reporter gene; 

wherein interaction of the second fusion protem and the third fusion protein in the 
host cell results in a desired level of expression of the second reporter gene; and 

ii isolating host cells comprising a third fiision protein capable of interacting 
10 with the first fusion protein, the second fusion protem, or the first and the second 

fusion proteins based on a desired level of expression of the first reporter gene, the 
second reporter gene, or the first and second reporter genes, respectively, thereby 
selecting a polypeptide which differentially interacts with at least two different test 
polypeptides. 

15 In one embodiment, host cells are isolated which comprise a third fusion protein that 

interacts with both the first and second fusion proteins. Alternatively, host cells may be 
isolated which comprise a third fusion protein that interacts to a greater extent with one of 
the peptides as compared to the other polypeptide. 

In other embodiments, the host cell may further comprise additional fusion proteins 
20 (2, 3, 4, or 5) which may be tested for interaction with a target fusion protein. Interaction of 
the target protein with one or more of the test fusion proteins may be determined based on 
the level of expression of one or more reporter genes. Tn certain embodiments, all of the 
reporter genes are the same. In other embodiments, interaction of the target fusion protein 
witii a first test protein affects the expression of a first reporter gene whereas interaction of 
25 the target fusion protein with any of the other fusion proteins affects the expression of a 
second reporter gene (e.g., interaction of the target protein with a second, third, fourth, or 
fifth test protein affects the expression level of a single reporter gene different from the first 
reporter gene), hi still further embodiments, interaction of tiie target fusion protein wifli up 
to five different test proteins affects the expression of up to five different reporter genes 
30 (e.g., interaction of the target protein which each of a first, second, third, fourth, and/or fifth 
test proteins affects the expression of a first, second, third, fourth, and/or fifth reporter gene, 
respectively, wherein each of the reporter genes is different and has a unique detectable 
signal). In various embodiments, host cells are isolated which contain (i) a target protein 
that interacts to a desired extent with all of the other fusion proteins; (ii) a target protein that 
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5 interacts with one of the polypeptides to a greater extent than it interacts with the other 
fusion proteins; or (iii) a target protein tliat interacts to a desired extent with a desired 
combination of at least two of the other fusion proteins. 

In various embodiments, the desired level of expression of at least one of the 
reporter genes is an increase, a decrease, or no change in the level of expression of the 

10 reporter gene as compared to the basal transcription level of the reporter gene. In a 

particular embodiment, the desired level of expression of one of the reporter genes is an 
increase in the level of expression of the reporter gene as compared to the basal 
transcription level of the reporter gene and the desired level of expression of the other 
reporter genes is no change in expression in any of the other reporter genes as compared to 

15 the basal transcription levels of the other reporter genes. 

In various embodiments, the reporter genes encode unique detectable proteins which 
can be analyzed independently, simultaneously, or independently and simultaneously. In 
certain embodiments, at least one of the reporter genes encodes a fluorescent protein. In 
another embodiment, the expression level of at least one of the reporter genes may be 
20 analyzed by FACS. 

In certain embodiments, the method further comprises the step of identifying nucleic 
acids which encode fusion proteins resulting in a desired level of expression of the desired 
reporter genes. 

In a further aspect, the invention features a method for selecting a test agent that 
25 diflferentially modulates the interaction of a polypeptide with at least two different test 
polypeptides, comprising: 

i providing a population ofprokaryotic host cells wherein each cell contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 

30 elements) for a first DNA-binding domain, 

(b) a second reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a second DNA-binding domain. 
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5 (c) a first chimeric gene which encodes a first fiision protein, the first 

fiision protein including a first DNA-binding domain and a first test 
polypeptide, 

(d) a second chimeric gene which encodes a second fiision protein, the 
second fiision protein including a second DNA-binding domain and a 

10 second test polypeptide, 

(e) a third chimeric gene which encodes a third fiision protein, the third 
fiision protein including an activation tag and third test polypeptide, 

wherein the host cell is an imp" or gram positive strain of bacteria; 

wherein interaction of the first fusion protein and the third fiision protein in the host 
15 cell results in a desired level of expression of the first reporter gene; 

wherein interaction of the second fiision protem and the third fiision protein in the 
host cell results in a desired level of expression of the second reporter gene; 

ii contacting the host cell with at least one test agent; and 

iii identifying test agents which modulate the expression of the first, second, or 
20 first and second reporter genes in a manner also dependent on the expression of the first, 

second and third test polypeptides, thereby selecting a test agent that differentially 
modulates the interaction of a polypeptide with at least two different test polypeptides. 

In another aspect, the invention features a method for detecting an interaction 
between a test polypeptide and a DNA sequence, comprising 

25 i providing a population of prokaiyotic host cells wherein each cell contains 

(a) a reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 

(b) a chimeric gene which encodes a fiision protein, the fusion protein 
30 including a test polypeptide and an activation tag. 
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5 wherein the DBD recognition element is part of a library of at least 10^ members, 

the fusion protein is part of a library of at least 10^ members, or the DBD recognition 
element and the fusion protein are both members of a library such that at least lO'^ unique 
pairs of a DBD recognition element and a fusion protein could be tested for interaction; 

wherein interaction between a test polypeptide of a fusion protein and a DBD 
10 recognition element in a host cell results in a desired level of expression of the reporter 
gene; 

wherein the desired level of expression of the reporter gene confers a growth 
advantage on the host cell; and 

ii isolating host cells with a growth advantage wherein said cells comprise a 
15 fusion protein and a DBD recognition element which interact, (hereby detecting an 

interaction between a test polypeptide and a DNA sequence. 

In certain embodiments, the method further comprises the step of identifying the 
nucleic acid which encodes a test polypeptide that interacts with the DBD recognition 
element DNA sequence. 

20 In other embodiments, selective growth conditions are applied to the host cells. 

In various embodiments, the desired level of expression of the reporter gene may be 
an increase, a decrease, or no change in the level of expression of the reporter gene as 
compared to flie basal expression level of the reporter gene. 

In other embodiments, the transcriptional regulatory sequence includes at least two, 
25 at least three, at least four, or at least five binding sites for a DNA-binding domain. 

In various embodiments, the reporter gene encodes a gene product that gives rise to 
a detectable signal selected from the groupxonsisting of cell viabiUty, relief of a cell 
nutritional requirement, cell growth and dmg resistance. 

In another embodiment, the degree of the growth advantage conferred by the desired 
30 level of expression of the reporter gene is controllable by varying the growth conditions of 
the host cell. In first particular embodiment, the reporter gene is the yeast His3 gene and 
the degree of the growth advantage is controllable by exposing the host cell to varying 
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5 concentrations of 3-aininotriazole. In a second particular embodiment, the reporter gene is 
a P-lactamase gene and the degree of the growth advantage is controllable by exposing the 
host cell to a p-lactam antibiotic or to a p-lactam antibiotic and a p-lactamase inhibitor. 
Examples of P-lactamase genes which may be used in accord with the invention include 
TEM-1, TEM-2, OXA-1, OXA-2, OXA-S, SHV-1, PSE-1, PSE-2, PSE-3, PSE^4 and CTX- 

10 1, and functional fragments thereof. Examples of P-lactam antibiotics which may be used 
in accord with the invention include penicillins, cephalosporins, monbactams and 
carbapenems. Examples of P-lactamase inhibitors which may be used in accord with the 
invention include Clavulanic acid, sulbactam, tazobactam, brobactam and p-lactamase 
inhibitory protein (BLIP). The p-lactam antibiotics and P-lactamase inhibitors are 

15 generally added to the growth mediinn of the host cells, however, in the case of BLIP, the 
inhibitory protein may be expressed withm the cell in addition to being added to the growth 
medium. 

In certain embodiments, the activation tag is an RNA polymerase, an RNA 
polymerase subnnit, a functional fragment of an RNA polymerase, or a functional fragment 

20 of an RNA polymerase subunit. In other embodiments, the activation tag is a polypeptide, a 
nucleic acid, or a small molecule, and wherein the activation tag binds RNA polymerase, an 
RNA polymerase subunit, a functional fragment of an RNA polymerase, or a functional 
fragment of an RNA polymerase subunit. In still other embodiments, the activation tag 
interacts indirectly with RNA polymerase via at least one intermediary polypeptide, nucleic 

25 acid, or small molecule, which binds to the activation tag and to RNA polymerase. In a 
particular embodiment, the activation tag is a fragment of Gal 1 IP, and wherein the 
activation tag interacts with a fusion between Gal4 and the a subimit of RNA polymerase. 

In a further embodiment, the prokaryotic host cell frnrther contains a second reporter 
gene such that interaction of the fusion protein with a second binding site for a DNA- 
30 binding domain in a host cell results in a desired level of expression of the second reports 
gene. The desired level of expression of the second reporter gene may be an increase, 
decrease, or no change in the level of expression of the second reporter gene as compared to 
the basal transcription level of the second rq)orter gme. 
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5 In certain embodiments, host cells are isolated wherein the desired level of 

expression of the first and second reporter genes is an increase in the expression level of the 
reporter genes as compared to the basal expression level of flie reporter genes. In another 
embodiment, host cells are isolated wherein the desired level of expression of the first 
reporter gene is an increase in the expression level of the first reporter gene as compared to 

10 the basal transcription level of the first reporter gene, and the desired level of expression of 
the second rqjorter gene is a smaller increase in the expression level of the second reporter 
gene as compared to the basal transcription level of the second reporter gene relative to the 
increase in expression of the first reporter gene. In additional embodiments, host cells may 
be isolated based on a decrease in the level of expression of one of the reporter genes as 

15 compared to the basal transcription level of the reporter gene. 

In another embodiment, the first and second reporter genes are operably linked to 
the same transcriptional regulatory sequence. Alternatively, the first and second reporter 
genes may be operably linked to separate copies of the same transcriptional regulatory 
sequence. Fmther, the first and second reporter genes may be operably linked to different 
20 transcriptional regulatory sequences. 

In various embodiments, the second reporter gene encodes a gene product that gives 
rise to a detectable signal selected firom the group consisting of color, fluorescence, 
luminescence, a cell surface tag, cell viabiUty, reKef of a cell nutritional requirement, cell 
growth and drug resistance. 

25 In another embodiment, the second reporter gene confers a growth advantage imder 

selective conditions different fi-om the conditions used for the first reporter gene. In other 
embodiments, host cells are isolated using FACS. 

Li various embodunents, the host cells containing a fiision protein capable of 
mteracting with a DNA-sequence are isolated by: 

30 i selecting a first population of host cells with a desired expression level of the 

first reporter gene foUowed by selecting a second population of host cells fi:om the first 
population of host cells based on a desired expression level of the second reporter gene; 
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5 ii selecting a first population of host cells with a desired expression level of the 

second reporter gene followed by selecting a second population of host cells from the first 
population of host cells based on a desired expression level of the first reporter gene; or 

iii selecting a population of host cells based on simultaneous selection of 
desired expression levels of the first and second reporter genes. 

10 In a particular embodiment, &e second reporter gene is the lacZ gene. 

In another embodiment, the second reporter gene encodes a fluorescent protein. 
Examples of fluorescent protems which may be used in accord with (he invention include 
green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), Renilla 
Reniformis green fluorescent protein, GFPmut2, GFPuv4, enhanced yellow fluorescent 
15 protein.(EYFP), enhanced cyan fluorescent protein (ECFP), enhanced blue fluorescent 
protein (EBFP), citrine and red fluorescent protein from discosoma (dsRED). 

In a further embodiment, the second reporter gene encodes a protein which is 
expressed on the surface of the host cell. 

In various particular embodiments, the first rq)orter gene is selected from the group 
20 consisting of the yeast His3 gene and a ^-lactamase gene and the second reporter gene is 
selected firam the group consisting of the lacZ gene, a fluorescent protein, a protein which is 
expressed on the surface of the host cell and the bacterial aadA gene. 

In certain embodiments, the prokaryotic host cell is selected from the group 
consisting of bacterial strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, 
25 Salmonella, Serratia, Streptococcus, Lactobacillus, Enterococcus and shigella. 

In another embodiment, the reporter gene construct and/or the chimeric gene 
construct may be contained within a vector for introduction into the host cell. Li particular 
embodiments, the vector may be a plasmid or a phagemid. Phagemid vectors are generally 
used in conjuction with a host cell that expresses a fimctional F pilus. Particular examples 
30 of phagemids which may be used in accord with the invention include pBluescriptnSK+ or 
pBR-GP-Z12BbsI, or derivatives or precursors thereof. When a phagemid vector is being 
utilized, the phagemid may be introduced into the host cell by infection of the host cell with 
infectious phage containing the phagemid vector in combination with a helper filamentous 
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5 phage. Examples of helper filamentous phage which may be used in accord with the 
invention include M13K07, VCS-M13, M13, and fl, and derivatives thereof. 

In various embodiments, the DBD recognition element is a member of a library of at 
least lO'', 10^ 10^ 10^^ lO", or 10^^ potential binding sites for a DNA binding domain, 
wherein host cells comprising a DBD recognition element bound by a test polypeptide are 

10 isolated. Alternatively, the DBD recognition element is a desired binding site for a DNA 
binding domain and the test polypeptide is a member of a library of at least 1 0^, 1 0^, 1 0^, 
10^^, 10^\ or 10^^ polypeptides, wherein host cells comprising a polypeptide which binds to 
the DBD recognition element are isolated. In a further embodiment, the DBD recognition 
element is a member of library of potential binding sites for a DNA binding domain and the 

15 test polypeptide is a member of a library of polypeptides, wherein host cells comprising a 
polypeptide that binds a DBD recognition element are isolated. 

In certain embodiments, the polypeptides are zmc finger proteins, hi other 
embodiments, binding sites for a DNA binding domain bind a zinc finger protein. The 
methods of the invention may be used to find DNA sequences which bind to a known or 
20 novel zinc finger protein. Alternatively, the methods of the invention may be used to 
isolate known or novel polypeptides which bind to a test DNA sequence. 

In another aspect, the invention featurs a method for identifying agents which 
modulate an interaction between a test polypeptide and a DNA sequence, comprising 

i providing a population of prokaryotic host cells wherein each cell contains 

25 (a) a reporter gene operably linked to a transcriptional regulatory 

sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 

(b) a chimeric gene which encodes a fiision protein, the fusion protein 
including a test polypeptide and an activation tag, 

30 wherein the prokaryotic host cell is an imp' or gram positive strain of bacteria; 
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5 wherein interaction between a test polypeptide of a fusion protein aad a DBD 

recognition element in a host cells results in a desired level of expression of the reporter 
gene; 

ii contacting the host cell with at least one test agent; and 

iii identifying agents which modulate expression of the reporter gene in a 

10 manner also dependent on the presence of a fusion protein and a DBD recognition element. 

In various embodiments, the rq)orter gene encodes a gene product that gives rise to 
a detectable signal selected from the group consisting of color, fluorescence, luminescence, 
a cell surface tag, cell viability, relief of a cell nutritional requirement, cell growth and drug 
resistance. 

15 hi other embodiments, the DBD recognition element is part of a Ubrary of at least 

10\ 10^ 10^, 10^^, 10^^, or 10^^ members, the fusion protein is part of a Ubrary of at least 
10^, 10^ 10^ 10^^, 10^^ or lO'^ members, or the DBD recognition element and the fusion 
protem are both members of a Ubrary such that at least lO'^, 10^ 10^ 10^°, 10^\ or 10^^ 
unique pairs of a DBD recognition element and a fusion protem could be tested for 

20 interaction. 

hi another embodiment, the method further comprises comparing the level of 
expression of the reporter gene to a level of expression in a control experiment wherein one 
or both of the test polypeptide and the DBD recognition element are absent or altered so as 
to preclude hiteraction of the fusion protein and the DBD recognition element 

25 In certain embodiments, the test agent is selected from the group consisting of 

peptides, nucleic acids, carbohydrates, natural product extract libraries, and smaU organic 
molecules. Additionally, the test agent may be part of a library of test agents. 

In other embodiments, test agents are identified which agonize or antagonize the 
protein-nucleic acid mteraction based on a change in the expression level of the reporter 
30 gene in the presence of the test agent. 

In certain embodiments, the host cells are grown under conditions which increase 
ttie permeability of the cell membrane. 
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5 In a further aspect, the mvention features a method for selecting a polypeptide that 

differentially interacts with at least two different DNA sequences, comprising 

i providing a population of prokaryotic host cells each of which contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 

10 elements) for a first DNA-binding domain, 

(b) a second reporter gene oporably linked to a transcriptional regulatory 
sequence which includes one or more binding site (DBD recognition 
element) for a second DNA-binding domain, 

(c) a chimeric gene which encodes a fusion protein, the fusion protein 
15 including a test polypeptide and an activation tag, 

wherein the fusion protein is part of a library of at least 10^ members; 

wherein interaction of a fusion protein with the first DBD recognition element in the 
host cells results in a desired level of ^pression of the first reporter gene; 

wherein interaction of a fusion protein with the second DBD recognition element in 
20 the host cells results in a desired level of expression of the second reporter gene; and 

ii isolating host cells comprising a fusion protein that interacts with the first 
DBD recognition element, the second DBD recognition element, or the first and second 
DBD recognition elements based on a desired level of e7q>ression of the first reporter gene, 
the second reporter gene, or the first and second reporter genes, respectively, thereby 

25 selecting a polypeptide that differentially interacts with at least two different DNA 
sequences. 



In various embodiments, the fusion protein is assayed for the ability to interact with 
at least two, three, four or five different DNA sequences each operably linked to reporter 
30 genes. In certain embodiments, the reporter genes are operably linked to the same 

transcriptional regulatory sequence. Alternatively, the reporter genes are operably linked to 
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5 separate copies of the same transcriptional regulatory sequence. Further, the reporter genes 
may be operably linked to different transcriptional regulatory sequences. 

In various embodiments, all of the reporter genes encode different proteins and each 
reporter gene may be detected independently, simultaneously, or independently and 
simultaneously. 

10 In certain embodiments, the method further comprises the step of isolating the 

nucleic acid which encodes at least one of the fusion proteins. 

In various embodiments, the desired level of expression of at least one of the 
reporter genes is an increase, a decrease, or no change in reporter gene expression as 
compared to the basal transcription level of the reporter gene. 

15 In a particular embodiment, host cells are isolated that have one reporter gene whose 

expression level is increased to a greater extent than the increase in the expression level of 
the other reporter genes, as compared to the basal level of expression of the reporter genes. 
In some cases, the reporter genes whose expression levels increase to a lesser extent (or not 
at all) are all the same reporter gene and are different &om the reporter gene whose level is 

20 increased. 

In another aspect, the invention features a method for selecting a test agent that 
differentially modulates the interaction of a polypeptide with at least two different DNA 
sequences, comprising 

i providing a population of prokaryotic host cells each of which contains 

25 (a) a first reporter gene operably linked to a transcriptional regulatory 

sequence which includes one or more binding sites (DBD recognition 
elements) for a first DNA-binding domain, 

(b) a second rq[)orter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding site (DBD recognition 

30 element) for a second DNA-binding domain, 

(c) a chimeric gene which encodes a fusion protein, the fusion protein 
including a test polypeptide and an activation tag. 
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5 wherein the prokaryotic host cell is an imp" or gram positive strain of bacteria; 

wherein interaction of a fiision protein with the first DBD recognition element in the 
host cells results in a desired level of expression of the first reporter gene; 

wherein interaction of a fusion protein with the second DBD recognition element in 
the host cells results in a desired level of expression of the second reporter gene; 

10 ii contacting the host cell with at least one test agent; and 

iii identifying test agents which modulate the expression of the first, second, or 
first and second reporter genes in a manner also dependent on the presence of the fusion 
protein and the first and second DBD recognition elements, thereby selecting a test agent 
that differentially modulates the interaction of a polypeptide with at least two different 
1 5 DNA sequences. 

In various embodiments, the DBD recognition element may be part of a library of at 
least 10\ 10^ 10^ 10^^, 10^\ or 10^^members,thefusionproteinmay bepartof alibrary of 
at least 10*^, 10^ 10^ 10*^, 10^^ or 10^^ members, or the DBD recognition element and the 
fusion protein may both be members of a library such that at least 10\ 10^, 10^ 10*^ lO", 
20 or 10^^ unique pairs of a DBD recognition element and a fusion protein could be tested for 
interaction. 

In yet another aspect, the invention features a method for detecting an interaction 
between a test RNA binding domain polypeptide and an RNA sequence, comprising 

i providing a population of prokaryotic host cells wherein each cell contains 

25 (a) a reporter gene operably linked to a transcriptional regulatory 

sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 

(b) a first chimeric gene which encodes a fusion protein, the fusion 

protein including a DNA-binding domain and a first RNA binding 
30 domain. 
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5 (c) a second chimeric gene which encodes a fusion protein, the fusion 

protein including an activation tag and a second RNA binding 
domain, 

(d) a third chimeric gene which encodes a hybrid RNA, the hybrid RNA 
comprising a first RNA sequence fhat binds one of the first or second 
10 RNA binding domains and a second RNA sequence to be tested for 

interaction with the RNA-binding domain not bound to the first RNA 
sequence; 

wherein the RNA-binding domain not bound to the JBrst RNA sequence is part of a 
library of at least 10^ members, the second RNA sequence is part of a library of at least 10^ 
15 members, or the RNA-binding domain not bound to the first RNA sequence and the second 
RNA sequence are both members of a library such that at least 10^ unique pairs of an RNA- 
binding domain and an RNA sequence could be tested for interaction; 

wherein interaction of an RNA-binding domain not bound to the first RNA 
sequence with the second RNA sequence in a host cell results in a desired level of 
20 expression of the reporter gene; and 

ii isolating host cells comprising an RNA-binding domain that interacts with 
the second RNA sequence based on a desired level of expression of the reporter gene 
thereby detecting an interaction between a test RNA binding domain polypeptide and an 
RNA sequence. 

25 In another aspect, the invention features a kit for selecting a polypeptide that 

interacts with a test polypeptide, coinprising: 

i a first gene construct for encoding a first fusion protein, which first gene 
constmct comprises: 

(a) transcriptional and translational elements which direct expression of 
30 a protein in a prokaryotic host cell. 
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(b) a DNA sequence that encodes a DNA binding domain and which is 
operably linked with the transcriptional and translational elements of 
the first gene construct, and 

(c) one or more sites for inserting a DNA sequence encoding a first test 
polypeptide into the first gene construct in such a manner that the 
first test polypeptide is expressed in-fiame as part of a fiision protein 
containing the DNA binding domain; 

ii a second gene construct for encoding a second fiision protein, which second 
gene construct comprises: 

(a) transcriptional and translational elements which direct expression of 
a protein in a prokaryotic host cell, 

(b) a DNA sequence that encodes an activation tag and which is 
operably linked with the transcriptional and translational elements of 
the second gene construct, and 

(c) one or more sites for inserting a DNA sequence encoding a second 
test polypeptide into the second gene construct in such a maimer that 
the second test polypeptide is expressed m-firame as part of a fiision 
protein containing the activation tag; and 

iii a prokaryotic host cell containing at least one reporter gene having one or 
more binding sites (DBD recognition elements) for the DNA binding domain, and 

wherein a desired level of expression of the reporter gene is obtained upon 
interaction of the first and second fiision proteins; and 

wherein the desired level of expression of the reporter grae confers a growth 
advantage on the host cell. 

In certain embodiments, the degree of growth advantage conferred by the expression 
of a desired level of the reporter gene may be controlled by varying the growth conditions 
of the cell. 
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5 In other embodiments, the transcriptional regulatory sequence includes at least two, 

at least three, at least four, or at least five binding sites for a DNA-binding domain. 

In various embodiments, the reporter gene encodes a gene product that gives rise to 
a detectable signal selected firom the group consisting of cell viability, relief of a cell 
nutritional requirement, cell growth and drug resistance. 

10 In another embodiment, the degree of the growth advantage conferred by the desired 

level of expression of the reporter gene is controUable by varying the growth conditions of 
the host cell. In first particular embodiment, the reporter gene is the yeast His3 gene and 
the degree of the growth advantage is controllable by exposing the host ceU to varying 
concentrations of 3-aminotriazole. In a second particular embodiment, the reporter gene is 

15 a p-lactamase gene and the degree of the growth advantage is controllable by exposing the 
host cell to a P-lactam antibiotic or to a p-lactam antibiotic and a ^-lactamase inhibitor. 
Examples of P-lactamase genes which may be used in accord with the invention include 
TEM-1, TEM-2, OXA-1, OXA-2, OXA-3, SHV-1, PSE-1, PSE-2, PSE-3, PSE-4 and CTX- 
1, and fimctional Augments thereof. Examples of P-lactam antibiotics which may be used 

20 in accord with the invention include penicUUns, cephalosporins, monbactams and 

carbapenems. Examples of ji-lactamase inhibitors which may be used in accord with the 
invention include Clavulanic acid, sulbactam, tazobactam, brobactam and p-Iactamase 
inhibitory protein (BLIP). The p-lactam antibiotics and P-lactamase inhibitors are 
generally added to the growth medixmi of the host cells, however, m the case of BLIP, the 

25 inhibitory protein may be expressed within the cell in addition to being added to the growth 
medium. 

In certain embodiments, the activation tag is an RNA polymerase, an RNA 
polymerase subunit, a fimctional fi:agment of an RNA polymerase, or a fimctional firagment 
of an RNA polymerase subunit In other embodiments, the activation tag is a polypeptide, a 
30 nucleic acid, or a small molecule, and wherein the activation tag binds SNA polymerase, an 
RNA polymerase subunit, a fimctional firagmmt of an RNA polymerase, or a fimctional 
firagment of an RNA polymerase subunit. h still other embodiments, the activation tag 
interacts indirectly with RNA polymerase via at least one intermediary polypeptide, nucleic 
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5 acid, or small molecule, which binds to the activation tag and to RNA polymerase. In a 
particular embodiment, the activation tag is a fragment of Gal 1 IP, and wherein the 
activation tag interacts with a fusion between Gal4 and the a subunit of RNA polymerase. 

In a further aspect, the invention features a kit for detecting an interaction between a 
test DNA-binding domain polypeptide and a DNA sequence, comprising: 

10 i a first gene constmct which comprises: 

(a) one or more sites for inserting a DNA sequence comprising a 
transcriptional element which includes at least one binding site (DBD 
recognition element) for a DNA-binding domain, 

(b) a translational element operably linked to the transcriptional element, 
15 and 

(c) a DNA sequence for a reporter gene which is operably linked with 
the transcriptional and translational elements of the first gene 
construct, and 

wherein the transcriptional and translational elements direct expression of 
20 the reporter gene in a prokaryotic host cell; 

ii a second gene construct for encoding a first fusion protein, which second 
gene construct comprises: 

(a) transcriptional and translational elements which direct expression of 
a protein in a prokaryotic host cell, 

25 (b) a DNA sequence that encodes an activation tag and which is 

operably linked with the transcriptional and translational elements of 
the second gene construct, and 

(c) one or niore sites for inserting a DNA sequence encoding a first test 
polypeptide into the second gene construct in such a manner that the 
30 first test polypeptide is expressed in-firame as part of a fusion protein 

containing the activation tag; 
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S iii a prokaryotic host cell, and 

wherein a desired level of expression of the reporter gene is obtained upon 
interaction of a test polypeptide with a DBD recognition element; and 

wherein the desired level of expression of the reporter gene confers a growth 
advantage on the host cell. 

10 In certain embodiments, the degree of growth advantage conferred by the expression 

of a desired level of the reporter gene may be controlled by varying the growth conditions 
of the cell. 

In other embodiments, the transcriptional regulatory sequence includes at least two, 
at least three, at least four, or at least five binding sites for a DNA-binding domain. 

15 In various embodiments, the reporter gene encodes a gene product that gives rise to 

a detectable signal selected from the group consisting of cell viability, relief of a cell 
nutritional requirement, cell growth and drug resistance. 

In another embodiment, the degree of the growth advantage conferred by flie desired 
level of expression of the reporter gene is controllable by varying the growfli conditions of 

20 the host cell. In first particular embodiment, the reporter gene is the yeast His3 gene and 
the degree of the growth advantage is controllable by exposing the host cell to varying 
concentrations of 3-aininotriazole. In a second particular embodiment, the reporter gene is 
a P-lactamase gene and the degree of the growth advantage is controllable by exposing the 
host cell to a P-lactam antibiotic or to a P-lactam antibiotic and a p-lactamase inhibitor. 

25 Examples of P-lactamase genes which may be used in accord with the invention include 
TEM-1, TEM-2, OXA-1, OXA-2, OXA-3, SHV-1, PSE-1, PSE-2, PSE-3, PSE-4 and CTX- 
1, and fimctional fragments thereof. Examples of p-lactam antibiotics which may be used 
in accord with the invoition include penicillins, cephalosporins, monbactams and 
carbapenems. Examples of P-lactamase inhibitors which may be used in accord with the 

30 invention include Clavulanic acid, sulbactam, tazobactam, brobactam and P-lactamase 
mhibitory protein (BLIP). The p-lactam antibiotics and p-lactamase inhibitors are 
generally added to the growth medium of the host cells, however, in the case of BLIP, the 
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5 inhibitory protein may be expressed within the cell in addition to being added to the growth 
medium. 

In certain embodiments, the activation tag is an RNA polymerase, an RNA 
polymerase subunit, a functional fragment of an KNA polymerase, or a functional fragment 
of an RNA polymerase subunit. In other embodiments, the activation tag is a polypeptide, a 

10 nucleic acid, or a small molecule, and wherem the activation tag binds RNA polymerase, an 
RNA polymerase subunit, a functional fragment of an RNA polymerase, or a fimctional 
fragment of an RNA polymerase subunit. In still other embodiments, the activation tag 
interacts indirectly with RNA polymerase via at least one intermediary polypeptide, nucleic 
acid, or small molecule, which binds to the activation tag and to RNA polymerase. In a 

15 particular embodiment, the activation tag is a fragment of Gal IIP, and wherein the 

activation tag interacts with a fiision between Gal4 and the a subunit of RNA polymerase. 

In another embodiment, the reporter gene construct and/or the chimeric gene 
construct may be contained within a vector for introduction into the host cell. In particular 
embodiments, the vector may be a plasmid or a phagemid. Phagemid vectors are generally 

20 used in conjuction with a host cell that expresses a functional F pilus. Particular examples 
of phagemids which may be used in accord with the invention include pBluescriptIISK+ or 
pBR-GP-Z12BbsI, or derivatives or precursors thereof. When a phagemid vector is being 
utilized, the phagemid may be introduced into the host cell by infection of the host cell with 
infectious phage containing the phagemid vector in combination with a helper filamentous 

25 phage. Examples of helper filamentous phage which may be used in accord with the 
invention include M13K07, VCS-M13, Ml 3, and fl, and derivatives thereof 

In particular embodnnents, the polypeptide is a zinc finger protein, or a library of 
known or potential zinc finger proteins. In another embodiment, the binding site for a DNA 
binding domain is a binding site for a zinc finger protein, or a Ubrary of known or potential 
30 zinc finger binding sites. . 

In another aspect, the invention features a method for detecting an interaction 
between a first test polypq)tide and a second test polypeptide, comprising: 

i providing a population of host cells wherein each cell contains 
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(a) a reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 

(b) a first chimeric gene which encodes a first fusion protein, the first 
fiision protein including a DNA-binding domain and a Gist test 
polypeptide, 

(c) a second chimeric gene which encodes a second fiision protein, the 
second fiision protein including an activation tag and second test 
polypeptide, 

wherein expression of the reporter gene results in signal detectable by FACS; 

wherein interaction of the first fiision protein and second fiision protein in the host 
cell results in a desired level of expression of the reporter gene; and 

ii isolating host cells comprising an interacting pair of fiision proteins based on 
a desired level of expression of the reporter gene using FACS thereby detecting an 
interaction between a first test polypeptide and a second test polypeptide. 

In certain embodiments, the method fiirifaer comprises the step of isolating the 
nucleic acid which encodes the test polypeptides. 

Jn various embodiments, the first, second, or first and second fiision proteins are 
members of a library, hi particular embodiments, the first fiision protein is part of a library 
of at least 10^ 10^ 10^ 10^^ 10^\ or 10^^ members, the second fiision protein is part of a 
Ubrary of at least 10^ 10*, 10^ 10^^ 10^^ or 10*^ members, or the first and second fiision 
proteins are both members of a library such that at least 10^, 10^ 10^ 10*^ 10*\ or 10*^ 
unique pairs of test polypeptides could be tested for interaction. 

In other embodiments, the host cell may be a eukaryotic cell or a prokaryotic cell. 
Exemplary eukaryotic cells include yeast and mammalian cells. Exemplary prokaryotic 
cells include Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, Seiratia, 
Streptococcus, Lactobacillus, Enterococcus and shigella. 
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5 In various embodiments, the desired level of expression of the reporter gene may be 

an increase, a decrease or no change in the level of expression of the reporter gene as 
compared to the basal transcription level of the reporter gene. 

Li certain embodiments, the transcriptional regulatory sequence includes at least 
two, three, four, or five binding sites for a DNA-binding domain. 

10 In various embodiments, the reporter gene encodes a protein product which gives 

rise to a detectable signal selected from the group consisting of fluorescence, luminescence 
and a cell surface tag. Exemplary fluorescent proteins which may be used in accord with 
the invention include green fluorescent protein (GEP), enhanced green fluorescent protein 
(EGFP), Renilla Reniformis green fluorescent protein, GFPmut2, GFPuv4, enhanced 

15 yellow fluorescent protein (EYFP), enhanced cyan fluorescent protein (ECFP), enhanced 
blue fluorescent protein (EBFP), citrine and red fluorescent proteui from discosoma 
(dsRED). 

Alternatively, the reporter gene may encode a cell surface tag. In association with 
this embodiment, the method may fiirther comprises the step of contacting the hoist cell with 
20 a fluorescently labeled antibody specific for the cell surface tag, thereby labeling the host 
cell, before isolation of host cells by FACS. 

In a finther embodiment, the host cell fiirther contains a second reporter gene such 
that interaction of a first fiision protein and a second fiision protein in a host cell results in a 
desired level of expression of the second reporter gene. The desired level of expression of 
25 the second reporter gene may be an increase, decrease, or no change in the level of 

expression of the second reporter gene as compared to the basal transcription level of the 
second reporter gene. 

In certain embodiments, host cells are isolated wherein the desired level of 
e^qpression of the fiurst and second reporter genes is an increase in the expression level of the 
30 reporter genes as compared to the basal expression level of the reporter genes. In another 
embodiment, host cells are isolated wherein the desired level of expression of the first 
reporter gene is an increase in the expression level of the first reporter gene as compared to 
the basal transcription level of the first reporter gene, and the desired level of expression of 
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5 the second reporter gene is a smaller increase in the expression level of the second reporter 
gene as compared to the basal transcription level of the second reporter gene relative to the 
increase in expression of the first reporter gene. In additional embodiments, host cells may 
be isolated based on a decrease in the level of expression of one of the reporter genes as 
compared to the basal transcription level of the reporter gene. 

10 In another embodiment, the first and second reporter genes are operably linked to 

the same transcriptional regulatory sequence. Alternatively, the first and second reporter 
genes may be operably linked to separate copies of the same transcriptional regulatory 
sequence. Further, the first and second reporter genes may be operably linked to different 
transcriptional regulatory sequences. Further, the first and second fiision proteins maybe 

15 . expressed from the same nucleic acid construct or firom separate nucleic acid constructs. 

In various embodiments, the second reporter gene encodes a gene product that gives 
rise to a detectable signal selected firom the group consisting of color, fluorescence, 
Imninescence, a cell surface tag, cell viability, relief of a cell nutritional requirement, cell 
growth and drug resistance, hi particular embodiments, the second reporter gene encodes 
20 for green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), Renilla 
Reniformis green fluorescent protein, GFPmut2, GFPuv4, enhanced yellow fluorescent 
protein (EYFP), enhanced cyan fluorescent protein (ECFP), enhanced blue fluorescent 
protein (EBFP), citrine or red fluorescent protein firom discosoma (dsRED). 

In another embodiment, the reporter gene encodes a cell surface tag. In accord with 
25 this embodiment, the method may further comprise the step of contacting the host cell with 
a fluorescently labeled antibody specific for the cell surface tag, thereby labeling the host 
cell, before isolation of host cells by FACS. 

In certain embodiments, the first and second reporter genes encode proteins which 
can be analyzed independently, simultaneously, or independentiiy and simultaneously. In 
30 other embodiments, the first and second reporter genes are analyzed by FACS, 

In a finrther embodiment, the expression level of the first, second, or first and second 
fiision proteins can be controlled by varying the growth conditions of the host cell. For 
example, in a particular embodiment, the expression level of the first and second fiision 
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5 proteins can be controlled by varying the concentration of EPTG, anhydrotetracycline, or 
IPTG and anhydrotetracycline to which the host cell is exposed. In another embodiment, 
the first, second, or first and second fiision proteins are expressed firom a promoter 
comprising a binding site for the lac repressor or the tet repressor. In certain embodiments, 
the expression level of the first and second fusion protein can be independently controlled. 

10 In another embodiment, the reporter gene construct and/or the chimeric gene 

constructs may contained within a vector for introduction into the host cell. In particular 
embodiments, the vector may be a plasmid or a phagemid. Phagemid vectors are generally 
used in conjuction with a host cell that expresses a functional F pilus. Particular examples 
of phagemids which may be used in accord with the invention include pBluescriptnSK+ or 

1 5 pBR-GP-Zl 2BbsI, or derivatives or precursors thereof When a phagemid vector is being 
utilized, the phagemid may be introduced into the host cell by infection of the host cell with 
infectious phage containing the phagemid vector in combination with a helper filamentous 
phage. Examples of helper filamentous phage which may be used in accord with the 
invention include M13K07, VCS-M13, M13, and fl, and derivatives thereof 

20 In a further aspect, the invention features a method for selecting a polypq)tide that 

differentially interacts with at least two different test polypeptides, comprising: 



1 



providing a population of host cells wherein each cell contains 



25 



(a) 



a first reports gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a first DNA-binding domain. 



(b) 



a second reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a second DNA-binding domain. 



30 



(c) 



a first chimeric gene which encodes a first fusion protein, the first 
fusion protein including a first DNA-binding domain and a first test 
polypeptide, 
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5 (d) a second chimeric gene which encodes a second fusion protein, the 

second fusion protein including a second DNA-binding domain and a 
second test polypeptide, 

(e) a thhrd chimeric gene which encodes a third fusion protein, the third 
fusion protein including an activation tag and third test polypeptide, 

10 wherein expression of the first and second reporter genes results in a signal 

detectable byFACS; 

wherein interaction of the first fusion protem and the third fusion protein in the host 
cell results in a desired level of expression of the first reporter gene; 

wherein interaction of the second fusion protein and the third fusion protein in the 
15 host cell results in a desired level of expression of the second reporter gene; and 

ii isolating host cells comprising a third fiision protein capable of interacting 
with the first fusion protein, the second fusion protein, or the first and the second 
fusion proteins based on a desired level of expression of the first reporter gene, the 
second reporter gene, or the first and second reporter genes, reispectively, using 
20 FACS, thereby selecting a polypeptide that differentially interacts with at least two 

different test polypeptides. 

In various embodiments, the fusion protein is assayed for the ability to interact with 
at least two, three, four or five different DNA sequences each operably linked to reporter 
genes. In certain embodiments, the reporter genes are operably linked to the same 
25 transcriptional regulatory sequence. Alternatively, the reporter genes are operably linked to 
separate copies of the same transcriptional regulatory sequence. Further, the reporter genes 
may be operably linked to different transcriptional regulatory sequences. 

In various embodiments, all of the reporter genes encode different proteins and each 
reporter gene may be detected independently, simultaneously, or independently and 
30 simultaneously. In a further embodiment, at least one of the reporter genes encodes a 
fluorescent protein. Alternatively, aU of the rq)orter genes may encode fluorescent 
proteins. 
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5 Jn certain embodiments, the method further comprises the step of isolating the 

nucleic acid which encodes at least one of the fusion proteins. 

In various embodiments, the desired level of expression of at least one of the 
reporter genes is an increase, a decrease, or no change in reporter gene expression as 
conq)ared to the basal transcription level of the reporter gene. 

10 In a particular embodiment, host cells are isolated that have one reporter gene whose 

expression level is increased to a greater extent than the increase in the expression level of 
the other reporter genes, as compared to the basal level of expression of the reporter genes. 
In some cases, the reporter genes whose expression levels increase to a lesser extent (or not 
at all) are all the same reporter gene and are different &om the reporter gene whose level is 

15 increased. , 

In other embodiments, the host cell may be a eukaryotic cell or a prokaryotic cell. 
Exemplary eukaryotic cells include yeast and mammalian cells. Exemplary prokaryotic 
cells include Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, Serratia, 
Streptococcus, Lactobacillus, Enterococcus and shigella. 

20 In another aspect, the invention features a method for detecting an interaction 

• between a test polypeptide and a DNA sequence, conq)rising 

i providing a population of host cells wherein each cell contains 

(a) a reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 

25 elements) for a DNA-binding domain, 

(b) a chimeric gene which encodes a fusion protein, the fusion protein 
including a test polypeptide and an activation tag, 

wherein expression of the reporter gene results in signal detectable by FACS; 

wherein interaction between a test polypeptide of a fusion protein and a DBD 
30 recognition element in a host cells results in a desired level of expression of the reporter 
gene; and 
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5 ii isolating host cells comprising a fusion protein that interacts with a DBD 

recognition element based on a desired level of expression of tiie reporter gene using FACS 
thereby detecting an interaction between the test polypeptide and the DBD recognition 
element DNA sequence. 

In certain embodiments, the activation tag is an RNA polymerase, an RNA 
10 polymerase subunit, a functional fragment of an RNA polymerase, or a functional fragment 
of an RNA polymerase subunit. In other embodiments, the activation tag is a polypeptide, a 
nucleic acid, or a small molecule, and wherein the activation tag binds RNA polymerase, an 
RNA polymerase subunit, a ftmctional fragment of an RNA polymerase, or a fimctional 
fragment of an RNA polymerase subunit. In still other embodiments, the activation tag 
15 interacts indirectly with RNA polymerase via at least one intermediary polypeptide, nucleic 
acid, or small molecule, which binds to the activation tag and to RNA polymerase. In a 
particular embodiment, the activation tag is a fragment of Gal 1 IP, and wherein the 
activation tag interacts with a fiision between Gal4 and the a subunit of RNA polymerase. 

In certain embodiments, the method fiirther comprises the step of isolating the 
20 nucleic acid which encodes the test polypeptides. 

In various embodiments, the DBD recognition element is a member of a Ubrary of at 
least 10^ 10^ 10^ 10^^ 10^^ or lo'^ potential binding sites for a DNA binding domain, 
wherein host cells comprising a DBD recognition element bound by a test polypeptide are 
isolated. Alternatively, the DBD recognition element is a desired binding site for a DNA 

25 binding domain and the test polypeptide is a member of a library of at least 1 0^, 1 0^ 1 0^ 
10^^, 10^\ or 10^^ polypeptides, wherein host cells comprising a polypeptide which binds to 
the DBD recognition element are isolated. In a further embodiment, the DBD recognition 
element is a member of library of potential binding sites for a DNA binding domain and the 
test polypeptide is a member of a Ubrary of polypeptides, wherein host cells comprising a 

30 polypeptide that binds a DBD recognition element are isolated 

In certain embodiments, flie polypq)tides are zinc finger proteins. In other 
embodiments, binding sites for a DNA binding domain bind a zinc finger protein. The 
methods of the invention may be used to find DNA sequoxces which bind to a known or 
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5 novel zinc finger protein. Alternatively, the methods of the invention may be used to 
isolate known or novel polypeptides which bind to a test DNA sequence. 

In another embodiment, the reporter gene construct and/or the chimeric gene 
construct may be contained within a vector for introduction into the host cell, hi particular 
embodiments, the vector may be a plasmid or a phageraid. Phagemid vectors are generally 

10 used in conjuction with a host cell that expresses a functional F pilus. Particular examples 
of phagemids which may be used in accord with the invention include pBluescriptnSK+ or 
pBR-GP-Z12BbsI, or derivatives or precursors thereof. When a phagemid vector is being 
utilized, the phagemid may be introduced into the host cell by infection of the host cell with 
infectious phage containing the phagemid vector in combination with a helper filamentous 

15 phage. Examples of helper filamentous phage which may be used in accord with the 
invention include M13K07, VCS-M13, M13, and fl, and derivatives thereof 

In other embodiments, the host cellmay be a eukaryotic cell or a prokaryotic cell. 
Exemplary eukaryotic cells include yeast and mammalian cells. Exemplary prokaryotic 
cells include Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, Serratia, 
20 Strqptococcus, Lactobacillus, Enterococcus and shigella. 

In a further aspect, the invention features a method for selecting a polypeptide that 
differentially interacts with at least two different DNA sequences, comprising 

i providing a population of host cells each of which contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 

25 sequence which includes one or more binding sites (DBD recognition 

elements) for a first DNA-binding domain, 

(b) a second reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding site PBD recognition 
element) for a second DNA-binding domain, 

30 (c) a chimeric gene which encodes a fiision protein, the fiision protein 

including a test polypeptide and an activation tag. 
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5 



wherein expression of the first and second reporter genes resuUs in a signal 



detectable byFACS; 

wherein interaction of a fusion protein with the first DBD recognition element in the 
host cells results in a desired level of expression of the first reporter gene; 

wherein interaction of a fusion protein with the second DBD recognition element in 
10 the host cells results in a desired level of expression of the second reporter gene; and 



DBD recognition element, the second DBD recognition element, or the first and second 
DBD recognition elements based on a desired level of expression of the first reporter gene, 
the second reporter gene, or the first and second reporter genes, respectively, using FACS, 
15 thereby selecting a polypeptide that differentially interacts with at least two different DNA 
sequences. 

In another aspect, the invention features a method for detecting an interaction 
between a test RNA binding domain polypeptide and an RNA sequence, comprising 



11 



isolating host cells comprising a fusion protein that interacts with the first 



1 



providing a population of host cells wherein each cell contains 



20 



(a) 



a reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 



(b) 



a first chimeric gene which encodes a fiision protein, the fiision 
protein including a DNA-binding domain and a first RNA binding 
domain. 



25 



(c) 



a second chimeric gene which encodes a fiision protein, the fiision 
piotein including an activation tag and a second RNA binding 
domain. 



(d) 



a third chimeric gene which encodes a hybrid RNA, the hybrid RNA 
comprising a first RNA sequence that binds one of the first or second 
RNA binding domains and a second RNA sequence to be tested for 



30 
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5 interaction with the RNA-binding domain not bound to the first KNA 

sequence; 

wherein the expression of the reporter gene produces a signal detectable by FACS; 

wherein interaction of an RNA-binding domain not bound to the first RNA 
sequence with the second RNA sequence in a host cell results in a desired level of 
1 0 expression of the reporter gene; and 

ii isolating host cells comprising an RNA-binding domain that interacts with 
the second RNA sequence based on a desired level of expression of the rq)orter gene 
thereby detecting an interaction between a test RNA binding domain polypeptide and an 
RNA sequence using FACS. 

15 In a further aspect, the invention features a kit for selecting a polypeptide that 

interacts with a test polypeptide, comprising: 

i a first gene constmct for encoding a first fiision protein, which first gene 
construct con:iprises: 

(a) transcriptional and translational elements which direct expression of 
20 a protein in a host cell, 

(b) a DNA sequence that encodes a DNA binding domain and which is 
operably linked with the transcriptional and translational elements of 
the first gene construct, and 

(c) one or more sites for inserting a DNA sequence encoding a first test 
25 polypeptide into the first gene construct in such a manner that the 

first test polypeptide is expressed in-firame as part of a fiision protein 
containing the DNA binding domain; 

ii a second gene construct for encoding a second fiision protein, which second 
gene construct comprises: 

30 (a) transcriptional and translational elements which direct expression of 

a protein in a host cell, 
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5 (b) a DNA sequence that encodes an activation tag and which is 

operably linked with the transcriptional and translational elements of 
the second gene construct, and 

(c) one or more sites for inserting a DNA sequence encoding a second 
test polypeptide into tiie second gene construct in such a manner that 
10 the second test polypeptide is expressed in-frame as part of a fusion 

protein containing the activation tag; 

iii a host cell containing at least one reporter gene having one or more binding 
sites (DBD recognition elements) for the DNA binding domain; 

wherein expression of the reporter gene produces a signal detectable by FACS; and 

15 wherein a desired level of expression of the reporter gene is obtained upon 

interaction of the first and second fusion proteins and can by analyzed using FACS. 

In still another aspect, the invention features a kit for detectmg an interaction 
between a test DNA-binding domain polypeptide and a DNA sequence, comprising: 

i a first gene construct which comprises: 

20 (a) one or more sites for inserting a DNA sequence comprising a 

transcriptional element which includes at least one binding site (DBD 
recognition element) for a DNA-binding domain, 

(b) a translational element operably Imked to the transoiptional element, 
and 

25 (c) a DNA sequence for at least one reporter gene which is operably 

linked with the transcriptional and translational elements of the first 
gene construct, and 

wherein the transcriptional and translational elements direct expression of 
the reporter gene in a host cell; 

30 ii a second gene construct for encoding a first fusion protein, which second 

gene construct comprises: 



wo 01/88197 



PCT/US01/157I8 



transcriptional and translational elements which direct expression of 
a protein in a host cell, 

a DNA sequmce that encodes an activation tag and which is 
operably Mnked with the transcriptional and translational elements of 
tiie second gene construct, and 

one or more sites for inserting a DNA sequence encoding a first test 
polypeptide into the second gene construct in such a manner that the 
first test polypeptide is expressed in-firame as part of a fiision protein 
containing the activation tag; 

iii a host cell; 

15 wherein expression of the reporter gene produces a signal detectable by FACS; and 

wherein a desired level of expression of the reporter gene is obtained upon 
interaction of a test polypeptide with a DBD recognition el^ent and can by analyzed by 
FACS. 

Other features and advantages of the invention will be apparent firom the following 

20 detailed description, and from the claims. The practice of the present invention will 

employ, unless otherwise indicated, conventional techniques of cell biology, cell culture, 
molecular biology, transgenic biology, microbiology, recombinant DNA, and immunology, 
which are witiiin the skill of the art. Such techniques are explained fiilly in the literature. 
See, for example. Molecular Cloning A Laboratory Manual. 2nd Ed., ed. by Sambrook, 

25 Fritsch and Maniatis (Cold Spring Harbor Laboratory Press: 1 989); DNA Cloning. Volumes 
I and n (D. N. Glover ed., 1985); Ohgonucleotide Svnthesis (M. J. Gait ed., 1984); Mullis 
et al. U.S. Patent No. 4,683,195; Nucleic Acid Hvbridization (B. D. Hames & S. J. Higgins 
eds. 1984); Transcription And Translation (B. D. Hames & S. J. Higgins eds. 1984); 
Culture Of Anim al Cells (R. L Freshney, Alan R. Liss, Inc., 1987); Immobilized Cells And 

30 Enzvmes (IRL Press, 1986); B. Perbal, A Practical Guide To Molecular Cloning (1984); tiie 
treatise. Methods In Enzvmolopv (Academic Press, Inc., N.Y.); Gene Transfer Vectors For 
Mammalian Cells (J. H. Miller and M. P. Calos eds., 1987, Cold Spring Haibor 
Laboratory); Methods In Enzvmology , Vols. 154 and 155 (Wu et al. eds.). 
Immunochemical Methods In Cell And Molecular Biology (Mayer and Walker, eds., 

35 Academic Press, London, 1987); Handbook Of Experimental Lnmunoloev. Volumes I-IV 



5 (a) 
(b) 

10 (c) 
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5 (D. M. Weir and C, C. Blackwell, eds., 1986); Manipulating the Mouse Embrvo, (Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1 986). 

Brief Description of the Figures 

Fig. L (A) Transcriptional activation in a previously described C(?/i-based genetic 
10 screen - developed by Hochschild and colleagues (refs 8,10) - for studying protein-DNA 
and protein-protein interactions. (5) Modified reporter template for our £, coli-b^sed 
genetic selection system. (C) Model for transcriptional activation of the Prff promoter by 
fusion proteins Gall lP-Zifl23 and aGal4. ZFl, ZF2, and ZF3 are the three zinc fingers of 
the Zif268 protein, 

15 Fig. 2. An E. c<>K-based selection system for identifying zinc finger variants firom 

large randomized libraries. The left side of the figure depicts a selection strain cell bearing 
a randomized zinc finger (white oval) that is unable to bind the target DNA subsite of 
interest (black box). This candidate fails to activate transcription of the weak promoter 
controlling HISS expression and therefore cells expressing this candidate fail to grow on 

20 HIS selective medium. The right side of the figure depicts a library candidate bearing a 
particular zinc finger (one member of the randomized library) (black oval) that can bind the 
target DNA site. This candidate can activate HISS expression and therefore cells 
expressing this candidate grow on HIS selective medium. 

Fig. 3. Recognition helix sequences of fingers isolated by our selection. For 
25 candidates that were isolated multiple times (as judged by nucleotide sequence), the number 
of clones obtained is shown in parentheses. The consensus sequence(s) of fingers selected 
by phage display for each target subsite are also shown (ref. 6, + denotes a positively 
charged residue, _ denotes no discernible preference). Asterisks indicate candidates with a 
2 bp deletion downstream of the sequence encoding the recognition helix. Arrows illustrate 
30 a few of the most plausible potential base contacts. 

Fig. 4. Illustrates the behavior of various fluorescent proteins in the bacterial two- 
hybrid system. 

Fig. 5. Isolation of positive candidates firom a mock library using flow cytometry. 

Fig. 6. This graph depicts the results of a certain embodiment of the subject 
35 interaction trap assay wherein the bait and prey protein expression levels can be 
individually controlled. 
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5 Fig. 7. Description of the two color TZ reporter system used for the experiment 

described in Figure 8. In this reporter, EGFP and RFP are each under the control of a weak 
promoter (pLac and a hybrid pRM/pLac respectively). When a Gall Ip containing bait 
protein binds to the Zif268 site, it causes increased EGFP production which can be 
measured in, e.g., fluorescence channel 1 (Fli 1). Similarly, when the Gall Ip containing 
10 bait protein binds to the Tl 1 site, it causes increased RFP production, which can be 
measured in, e.g., fluorescence channel 2 (Fli 2). 

Fig. 8. This plot displays the results jfrom three separate experiments in which 
otherwise identical cells, each containing the two color TZ reporter (shown in Figure 7), are 
expressing either Gall lp-zif268, which should interact only with the Zif268 binding site; 
15 Gall Ip-Tl 1, which should interact only with the Tl 1 site; and Gall lp-Z12, which should 
interact with neither bindmg site. Each dot indicates the amount of EGFP and RFP signal 
for an individual cell. The data for 1000 cells finom each group is shown. 

Fig. 9. This figure shows the results of a certain embodiment of the subject 
interaction trap assay wherein a DNA-sequence can be selected which interacts with a 
20 specific protein. 

Detailed Description of the Invention 

In order to address certain of the above-described deficiencies in the art, the 
inventors herein disclose various embodiments of the ITS which permit the use of 

25 interaction trap assays capable of analyzing Ubraries exceeding the current limitation of 10^ 
candidate sequences by several orders of magnitude. Certain versions of the subject assays 
are designed for detecting DNA-protein interactions (including tests of their specificity), 
while other embodiments are designed for detecting protein-protein interactions. Similar 
methods could be used to screen for drugs that facilitate or interfere with such interactions. 

30 One feature of the subject assay which facilitates the search of large libraries is that it 
permits a more exhaustive search of the sequence space for transcriptional regulatory 
sequences and usefiil naturally occurring and/or synthetic polypeptides. In addition, 
methods that permit the simultaneous and independent measurement of multiple reporters 
and the isolation of cells with desired reporter gene expression "profiles" are described 

35 (such methods can be in applied, in principle, to either prokaryotic or eukaryotic [e.g. — 

yeast or mammahan] cells). Finally, methods for constructing libraries of plasmids that can 
be introduce and "rescued" from cells without the need for transformation orplasmid 
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5 isolation are described. This asfpect of the subject invention also provides a means for 

producing combinations of interacting pairs that exceed current limits of cell transformation 
efficiency. 

The goal of all of the methods described in this ^plication is to identify, modify, or 
optimize proteins, small molecules (drugs), or nucleic acid sequences with affinities and 

10 specifities for their target interaction partner(s) that pemiit them to function effectively in 
vivo. We note that the output of one or more of the methods disclosed here may be one or 
more candidates (that is, a pool or enriched library of candidates) that have potentially 
desirable characteristics for use in in vivo contexts. Pools of candidates may also require 
additional testing in mammalian cells or other functional assays to determine which 

15 candidate(s) will be most useful in vivo. 

1. Overview 

A, High throughput analysis of large libraries. 

The present invention provides several embodiments of detection techniques which 
20 facilitate the screening of large libraries of sequences, e.g., greater than 10^ different 
sequences, and more preferably greater than 10* , 10^10^*^, or lO" different sequences. 

One of those embodiments, the use of flow cytometry with (optionally) multiple 
FACS-active reporters, is discussed further in Section 1(B) below. 

In another embodiment of the subject assay, the reporter gene encodes a gene 
25 product which confers a growth advantage (which is **tunable" in the preferred 

embodiment) to a prokaryotic host cell, rather than merely a visual scre^iing marker. By 
**tunable*', it is meant that the activity of the reporter gene product, and therefore the 
stringency of the ITS, can be adjusted, such as by use of a competitive inhibitor of the 
reporter gene product. To further illustrate this strategy, we have discovered that, 
30 surprisingly, the HISS reporter gene, along with the use of 3 AT, can be used to rescue a 
prokaryotic host ceU in HIS selective media with sufficient stringency to be able to 
successfully isolate interacting pairs fi-om a large library of variants. Lack of stringency in 
other systems can result in isolation of a significant population of background or 
breakthrough false positives, as described in the Background section above. In large 
35 libraries, a high percentage of false positives can make the isolation and identification of 
true interactors time consuming, if not in:5)ossible. In the case of the HISS reporter, the use 
of SAT (a competitive inhibitor of HISS) can facilitate the selection of cells in which the 
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5 HIS3 reporter is highly expressed, and thereby lower the number of weak interactions/false 
positives in the enriched product. 

Thus, the subject assay can be set up to utilize a reporter gene system that reduces 
the nxmiber of false positive interactions to less than 50% of an enriched library, and more 
preferably less than 25 percent, or even 10, 5 or 1 percent. In a preferred embodiment, the 
10 assay reduces the occurrence rate of breakthrough false positives to less than 1:10^, and 
even more preferably less than l:10^ 1:10^ or even 1:10^^. 

B. FloW'ITS embodiments 

The flow-ITS technique of the present invention provides an interaction trap system 
15 having a detection step in which expression of the reporter gene permits selection of cells 
by flow cytometry. In preferred embodiments, the assay also includes a preselection step in 
which the population of cells subjected to FACS analysis is pre-enriched for interactors. 
The subject assay relies on the use of reporter genes which express gene products tiiat are 
(i) localized to the cell surface (a cell surface protein) and include an extracellular domain 
20 which can be tagged with an antibody or other binding moiety, or (ii) fluorescently active, 
or both. 

The first, though optional, step of the flow-ITS is a "pre-flow enrichment?' step that 
permits throughput of extremely large numbers of cells from the interaction trap (the 'TTS 
cells"). In this step, ITS cells that express a particular reporter cell surface protein are 

25 identified and isolated in an afiinity separation step. To accomplish this, the ITS cells 

include a reporter gene which encodes a cell surface protein (referred to herein as a "surface 
FACS tag'* protein). Upon development of the interaction trap, e.g., after sufficient time 
has elapsed such that e;q>ression of the reporter gene will have occurred in cells in which 
the bait and prey proteins interact, the ITS cells are applied to a matrix which can be 

30 sequestered and which includes a moiety that interacts with the sur&ce FACS tag protein. 
In this maimer, ITS cells expressing the surface FACS tag can be sequestered on the matrix 
and thereby separated fix>m ITS cells which do not express at least a certain threshold level 
of the surface FACS tag. As described in further detail below, this pre-enrichment step 
permits the screening of mitial ITS cell populations exceeding 10^3 cells per day using 

35 conventional columns. 



In other embodiments, a pre-flow enrichment step can be used wherein the host cell 
also includes a reporter gene construct encoding a growth selection marker, such as the 
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5 HIS3 gene construct described above, which pennits enrichment of the cell population by 
growth selection prior to the cytometric sorting step. In one embodiment, the reporter gene 
is a multicistronic reporter, e.g., the coding sequence for the FACS tag and growth selection 
marker being under the control of the same transcriptional regulatory sequence(s) and 
arranged such that a single mRNA transcript includes both coding sequence. In such 
10 embodiments, it may be necessary to mclude other elements well known in the art, such as 
internal ribosome entry sequences (IRES) and the like in order to obtain a suitable level of 
translation of the additional coding sequences found in the transcript. 

The second step of the subject flow-ITS involves the use of fluorescence activated 
cell sorting (FACS) techniques. In this step, ITS cells expressing a reporter gene encoding 

15 a surface FACS tag or a fluorescently active polypeptide (whether localized to the 

cytoplasm or cell surface), can be detected and thus can be isolated by flow cytometry. As 
described in further detail below, state-of-the-art FACS techniques can sort cells at rates up 
to 70,000 cells/sec in "purity sort mode" (wherein the resultant sorted population of cells is 
relatively pure), and at rates of greater than 100,000 cells/sec in "enrich mode" (wherein the 

20 resultant sorted population of cells is less pure) (www.cytomation.com/noncomm/products/ 
prod cyto_mls.htnd). Thus, with currently available FACS technology, greater than 6 x 
10^ cells can be sorted per day. 

In addition, modem FACS equipment can simultaneously sort based on 
fluorescence at different wavelengths, e.g., can detect the expression of two or more 
25 diSerent reporter genes and gate cells for isolation accordingly. 

In particular embodiments, it may be desirable to provide two or more reporter gene 
constructs which are chosen because of a desire to determine if their expression is regulated 
by interaction of the bait and prey proteins and transcriptional regulatory elements of each 
reporter. The reporter genes can both encode direct FACS tags, indirect FACS tags, or a 
30 combination thereof. One or more of the reporter genes can encode a polypeptide which 
can be used in the pre-flow enrichment step described below. 

The simultaneous expression of the various reporter genes (whether provided on the 
same or separate plasniids) provides a means for distinguishing actual interaction of the bait 
and prey proteins and transcriptional regulatory elements from, e.g., mutations or other 
35 spurious activation of the reporter gene and also provides a means for selecting proteins 
with the desired specificity. In one embodiment of a multiple reporter assay, the subject 
flow-ITS can be used to identify a DNA binding domain (as described in further detail 
below). For instance, multiple reporter gene constructs can be used in order to permit 
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5 isolation of domains with selective binding activity. For example, the ITS host cell can 
include one or more reporter genes having transcriptional regulatory sequences for which a 
DNA binding domain is sought. At the same time, the cells can also include one or more 
reporter genes, encoding different FACS markers than above, under the control of 
transcriptional regulatory sequences for which it is desired that the DBD being sought does 
10 not bind to or activate expression from. Thus, cells can be sorted on the basis of differential 
expression of the reporter genes. Extensions of this method could be developed to analyze 
the specificity of protein-protein interactions (see example in Background above). 

The prokaryotic interaction trap systems described herein provide advantages over 
the conventional eukaryotic ITS. For example, the transformation frequency of prokaryotic 

1 5 cells permits the creation of host cells harboring libraries larger than 1 0^. The use of 
bacterial host cells to generate an interaction trap system also provides a system which is 
generally easier to manipulate genetically relative to the eukaryotic systems. Furthermore, 
bacterial host cells are easier to propagate. The shorter doubling times for bacteria will 
often provide for development of a FACS-detectable signal in the ITS in a shorter time 

20 period than would be obtained with a eukaryotic ITS. 

Yet another benefit which may be realized by the use of the prokaryotic ITS is 
lower spurious activation relative to, e.g., the ITS fiision proteins employed in yeast. In 
eukaryotic cells, spurious transcription activation by a bait polypeptide having a high acidic 
residue content can be problematic. This is not expected to an impediment for the use of 
25 such bait polypeptides in the prokaryotic ITS. 

Another benefit in the use of the prokaryotic ITS is that, in contrast to the eukaryotic 
systems, nuclear locaUzation of the bait and prey polypeptides is not a concern in bacterial 
cells. 

Still another advantage of the use of the prokaryotic ITS can be realized where the 
30 bait and/or prey polypeptides are derived from eukaryotic sources, such as hmnan. One 
problem which can occur when using the yeast-based ITS of the prior art is that 
manamalian/eukaryotic derived bait or prey may retain sufficient biological activity in yeast 
cells so as to confound the results of the ITS, The greater evolutionary divergence between 
mammals and bacteria reduces the likelihood of a similar problem in the prokaryotic ITS of 
35 the present invention. 



C Directed Evolution 



wo 01/88197 



PCT/USOl/15718 



-57- 

5 Moreover, the subject method can be used for directed evolution involving protein- 

protein interactions, protein-DNA interactions, protein-drug interactions, or drug-DNA 
interactions. For instance, identified interacting pairs can be improved by additional 
rounds of mutagenesis, selection, and amplification, e.g., diversity can be introduced into 
one or both of the identified interacting pair, and the resulting library screened according 

10 to the present invention. The goal may be, for instance, to use such a process to optimize 
the binding characteristics, e.g., for tighter binders and/or better selectivity in binding. 
Diversity can be introduced by most any standard mutagenesis technique, such as by 
irradiation, chemical treatment, low fidelity replication, use of randomized PCR primters, 
etc (see below). Moreover, flie ability to selectively control (tune) the stringency of the 

15 isolation/detection step (and therefore provide the user ynth the ability to set specific 
cutolBFs of windows) in the subject assay format or to use multiple FACS tags and thus 
directly test for specificity can be extremely beneficial for directed evolution ^preaches, 

D. Selecting DNA-protein interactions 

20 Li addition to protein-protein interactions, the various ITS embodiments described 

herein can be used to identify protein-DNA interactions. DNA-binding proteins, such as 
transcription factors, are critical regulators of gene expression. For example, transcriptional 
regulatory proteins are known to play a key role in cellular signal transduction pathways 
which convert extracellular signals into altered gene expression (Curran and Franza, (1988) 

25 Cell 55:395-397). DNA-binding protems also play critical roles in the control of cell growth 
and in the expression of viral and bacterial genes. A large number of biological and clinical 
protocols, including among others, gene therapy, production of biological materials, and 
biological research, depend on the ability to elicit specific and high-level expression of 
genes encoding RNAs or proteins of therapeutic, commercial, or experimental value. Such 

30 gene expression is very often dependent on protein-DNA interactions. 

E. Construction of Phagemid-Based Libraries 

Another aspect of the present invention describes a method for constructing protein- 
encoding libraries that (once constructed using standard transformation procedures) can be 
35 introduced into bacterial cells without the need for additional transformation. Members of 
this library can then be "rescued" firom bacterial cells without the need to perform labor- 
intensive plasmid extraction and introduced into bacterial cells again without the need for 
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5 transformatioa. This method is particularly useful for library vs. library screening/selection 
experiments, for directed or continuous evolution strategies, for serial selection protocols 
designed to reduce background false positives, and for automating the processing and re- 
testing of positive candidates from a screen/selection. 

One embodiment of this aspect of the invention is to construct protein-encoding 
10 libraries on phagemid vectors. Phagemid vectors (e.g. — ^pBluescriptnSK+ or pBR-GP- 
Z12BbsI [from Example 1 below]) harbor two origms of replication: one (e.g. — ColEI 
origin) permits replication as a standard multicopy, double-stranded plasmid and the second 
(e.g. — ^Fl origin) permits replication as a single-stranded filamentous phage genome IF 
phage-encoded protems are also expressed in the cell. Infection of cells harboring the 
15 double-stranded phagemid with a filamentous helper phage (attenuated in its ability to 

replicate by mutations in its own origin of replication) results in the production of infectious 
phage particles containing single-stranded versions of the phagemid. Even if multiple 
plasmids are present in the cell (as is the case for most ITS experiments), the phagemid can 
be selectively rescued as phage using this system. These phage particles can be used to 
20 "infecf new bacterial cells resulting in the introduction of the single-stranded phagemid 
which then replicates as a standard double-stranded plasmid. (Note that cells can only be 
infected if they express an F pilus.) Thus, this methodology permits the rescue of 
phagemids from cells by infection with a helper phage and their subsequent introduction 
into fresh cells by simple infection. 

25 This phagemid-based technology can be used to facilitate large library vs. Ubrary 

experiments. For example, one could create a library of 10^ or more prey proteins by 
introducing them into E. coU using standard transformation methods and then "rescuing" 
the library as phage by infecting the transformed cells with a helper phage. One could also 
create a library of 10^ or more bait proteins by introducing them into an E. coli strain 

30 harboring a measurable reporter gene by standard transformation teclmiques. To cross the 
libraries one would simply infect the bait library of cells with the prey library of phage 
(using an excess of cells over phage to ensure that each cell is on average only infected by 
one phage) and look for activated expression of the rq)orter gene. Since one is not limited 
by transformation efficiencies, in theory one should be able to use enough cells and phage 

35 to ensure coverage of nearly all possible --lO'^ or more pairwise combinations. 

The phagemid-based method is also usefril for experiments requiring serial 
selection/screening (e.g. — for directed evolution approaches). For example, one could 
create a library as phage, infect a reporter strain of interest, perform the selection/screen. 
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5 and then rescue positives again as phage. This enriched pool of phage could then be 
mutagenized (e.g. — by infection of and replication in a mutator strain) and then 
reintroduced into a reporter stradn for the next round of selection/screening. This process 
could be continued for many cycles to obtain the desired candidates. 

In addition, phagemid rescue can be used to enrich a library when true positives are 
1 0 rare relative to the background breakthrough rate of a particular selection/screen (tiiat is, 
spontaneously occurring false positives). As described in greater detail in Example 1 below 
(see NRE selection results), rescue of phagemids from an intial selection followed by 
reintroduction and reselection in fresh reporter strain cells can enrich for true positives 
relative to false positives whose phenotype is not linked to the presence of the phagemid. 

15 The ability to easily rescue and reintroduce hbrary phagemids also facilitates the 

analysis of potential interactors obtained from selections or screens in several ways: 1) 
Phagemid-linkage testing. An important test of whether a phagemid-encoded library 
candidate is a true positive is whether altered expression of the reporter gene is linked with 
the phagemid (that is, does the phagemid when isolated and reintroduced into the reporter 

20 strain still activate expression of the reporter?). Linkage testing is greatly facilitated when 
performed by the phagemid-based system. Infection of phagemid-containing cells with 
helper phage results in the selective **rescue" of only the phagemid and not other plasmids 
typically present in the ITS rq>orter strain. This rescue by phage infection is much faster 
than alternative protocols involving plasmid isolation followed by retransformation into an 

25 intermediate bacterial strain to separate the plasmid encoding the hbrary candidate from 
other plasmids in the cell. 2) Tests of interaction specificity. Rescued phagemids can also 
be easily introduced into a number of reporter strains expressing different interaction targets 
to test their specificity of interaction. Simple infection of these reporter strains by phage is 
much easier than alternative methods involving transformation (which would require 

30 making all reporter strains competent and then performing multiple transformations). 3) 
Preparation of DNA for sequencing. Phage (harboring candidate phagemids) can also be 
used to infect standard cloning strains (e.g. — ^XLl-Blue) to prepare clonal DNA for 
sequencing. Again, no transformation is necessary to effect transfer of the phagemid to a 
strain suitable for preparing plasmid DNA. Example 1 below illustrates the use of 

35 phagemid rescue to facilitate phagemid-linkage testing, tests of interaction specificity, and 
preparation of DNA for sequencing. 
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5 n. Definitions 

Before further description of the invration, certain terms employed in the 
specification, examples and appended claims are, for convenience, collected here. 

The term "prokaryote" is art recognized and refers to a unicellular organism lacking 
a tme nucleus and nuclear membrane, having genetic material conqposed of a single loop of 
10 naked double-stranded DNA. Prokaryotes with the exception of mycoplasmas have a rigid 
cell wall. In some systems of classification, a division of the kingdom Prokaryotae, 
Bacteria mclude all prokaryotic organisms that are not blue-green algae (Cyanophyceae). In 
other systems, prokaryotic organisms without a true cell wall are considered to be unrelated 
to the Bacteria and are placed in a separate class»the Mollicutes. 

15 The term ^l>acteria" is art recognized and refers to certain single-celled 

microorganisms of about 1 micrometer in diameter; most species have a rigid cell wall. 
They differ fix)m other organisms (eukaryotes) in lacking a nucleus and membrane-bound 
organelles and also in much of their biochemistry. 

The term "eukaryote'* is an art recognized term which refers to an organism whose 
20 cells have a distinct nucleus, multiple chromosomes, and a mitotic cycle. Eukaryotic cells 
include cell firom animals, plants, and fimgi, but not bacteria or algae. 

As used herein, "recombinant cells" include any cells that have been modified by 
the introduction of heterologous DNA. 

As used herein, the temis "heterologous DNA" or "heterologous nucleic acid" is 
25 meant to include DNA that does not occur naturally as part of the genome in which it is 
present, or DNA which is found in a location or locations in the genome that differs firom 
that ID which it occurs in nature, or occurs extra-chromasomally, e.g., as part of a plasmid. 

By 'protein" or "polypeptide" is meant a sequence of amino acids of any length, 
constituting all or a part of a naturally-occurring polypeptide or peptide, or constituting a 
30 non-naturally-occurring polypeptide or peptide (e.g., a randomly generated peptide 
sequence or one of an intentionally designed collection of peptide sequences). 

The tarns "chimeric", "fusion" and "composite" are used to denote a protein, 
peptide domain or nucleotide sequence or molecule containing at least two component 
portions which are mutually heterologous in the sense that they are not, otherwise, found 
35 directly (covalently) linked in nature. More specifically, the component portions are not 

found in the same continuous polypeptide or gene in nature, at least not in the same order or 
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5 orientation or with the same spacing present in the chimeric protein or composite domain. 
Such materials contain components derived from at least two different proteins or genes or 
from at least two non-adjacent portions of the same protein or gene. Composite proteins, 
and DNA sequences which encode them, are recombinant in the sense that they contain at 
least two constituent portions which are not otherwise foxmd directly linked (covalently) 

10 together in nature. 

By a *T)NA binding domain" or *'DBD" is meant a polypeptide sequence which is 
capable of directing specific polypeptide binding to a particular DNA sequence (i:e., to a 
DBD recognition element). The term "domain" in this context is not intended to be limited 
to a single discrete folding domain. Rather, consideration of a polypeptide as a DBD for 
15 use in the bait fusion protein can be made simply by the observation that the polypeptide 
has a specific DNA binding activity. DNA binding domains, like activation tags, can be 
derived from proteins ranging from naturally occxuring proteins to completely artificial 
sequences. 

The term "activation tag" refers to a molecule capable of affecting transcriptional 

20 activation on its own or by assembling, or recruiting, an active polymerase complex. In 
various embodiments, the activation tag may be a polypeptide, a nucleic acid or a small 
moleucle. Jn certain embodiments, the activation tag is an RNA polymerase, an RNA 
polymerase subunit, a functional fragment of an RNA polymerase, or a fiinctionaJ fragment 
of an RNA polymerase subunit. In other embodiments, the activation tag is a polypeptide, 

25 nucleic acid or small moleucle, that can directly interact with RNA polymerase, an RNA 
polymerase subimit, a functional fragment of an RNA polymerase, a functional fragment of 
an RNA polymerase subunit, a molecule covalently fused to RNA polymerase, a molecule 
covalently fused to an RNA polymerase subunit, a molecule covalently fused to a 
functional firagment of RNA polymerase, or a molecule covalently fused to a functional 

30 fragment of an RNA polymerase subunit. In still other embodiments, the activation tag is a 
molecule (polypeptide, nucleic acid, or small molecule) which interacts indirectly with 
RNA polymerase, an RNA polymerase subunit, a functional fragment of an RNA 
polymerase, or a functional fragment of an RNA polymerase subunit, via at least one 
intermediary molecule (polypeptide, nucleic acid, or small molecule), whereia the 

35 intermediary molecule can functionally link the activation tag to RNA polymerase, an RNA 
polymerase submiit, a functional firagment of an RNA polymerase, or a functional fragment 
of an RNA polymerase subunit. Activation tag3 can be known sequences or molecules or 
can be derived fipom random libraries or polypeptide, nucleic acids, small molecules. 



wo 01/88197 



PCT/USOl/15718 



-62- 



5 The tenns "recombinant protein" "heterologous protein'* and "exogenous protein" 

are used mterchangeably throu^out the specification and refer to a polypeptide which is 
produced by recombinant DNA techniques, wherein generally, DNA encoding the 
polypeptide is inserted into a suitable expression vector which is in turn used to transform a 
host cell to produce the heterologous protein. That is, the polypeptide is expressed from a 
10 heterologous nucleic acid. 

As used herein, a "reporter gene construct" is a nucleic acid that includes a "reporter 
gene" operatively linked to transcriptional regulatory sequences. Transcription of the 
reporter gene is controlled by these sequences. The activity of at least one or more of these 
control sequences is directly or indirectly regulated by a transcriptional complex recruited 

15 by virtue of interaction of the DBD with its binding site and between the bait and prey 
fusion proteins. The transcriptional regulatory sequences can include a promoter and other 
regulatory regions that modulate the activity of the promoter, or regulatory sequences titiat 
modulate the activity or efficiency of the RNA polymerase that recognizes the promoter. 
Such sequences are herein collectively referred to as transcriptional regulatory elements or 

20 sequences. The reporter gene construct will also include a *T)BD recognition element" 
which is a nucleotide sequence that is specifically bound by the DNA binding domain of 
the bait fusion protein. The DBD recognition element is located sufficiently proximal to 
the promoter sequence of the reporter gene so as to cause increased reporter gene 
expression upon recruitment of an RNA polymerase complex by a bait fusion protein bound 

25 at the recognition element. 

As used herein, a '^reporter gene" is a gene whose expression may be detected. For 
example, in the case of the subject flow-ITS, expression of the reporter may be detected by, 
e.g., flow cytometry and/or affinity chromatography, reporter genes may encode any 
protein or nucleic acid that provides a cell surfece marker, e.g„ a surface antigen for which 
30 specific antibodies/Iigands are available, or a protein or nucleic acid otherwise detectable 
by FACS analysis, hi other embodiments, the reporter gene encodes a protein or nucleic 
acid which confers a selectable growfli phenotype to the host cell. 

By "operably linked" is meant that a gene and transcriptional regulatory sequence(s) 
are connected in such a way as to permit expression of the gene in a manner dependent 
35 upon factors intemcting with the regulatory sequence(s). In the case of the reporter gene, at 
least one DNA binding domain (DBD) recognition element will also be operably linked to 
the reporter gene such that transcription of the reporter gene will be dependent, at least in 
part, upon bait-prey complexes bound to the recognition element. Although, as explained, a 
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5 single fusion protein with a covaientiy attached activation tag may be used when selecting 
DBDs on their binding sites. 

The tenns "basic promoter" or "minimal promoter^*, as used herein, are intended to 
refer to the minimal transcriptional regulatory sequence that is capable of initiating 
transcription of a selected DNA sequence to which it is operably linked. This temi is 
1 0 intended to represent a promoter element providing basal transcription. 

The term "transcription factor*' refers to any protein or modified form thereof that is 
involved in the initiation of transcription but which is not itself a part of the polymerase. 
Transcription factors are proteins or modified forms thereof, which interact preferentially 
with specific nucleic acid sequences, i.e., regulatory elements, and which in appropriate 

15 conditions stimulate transcription ("transcriptional activators") or repress transcription 
("transcriptional repressors"). Some transcription factors are active when they are in the 
form of a monomer. Alternatively, other transcription factors are active in the form of 
oligomers consisting of two or more identical proteins or different proteins (heterodimer). 
The factors have different actions during the transcription initiation: they may interact with 

20 other factors, with the RNA polymerase, with the entire complex, with activators, or with 
DNA. The factors are generally classifiable into two groups: (i) the general transcription 
factors, and (ii) the transcription activators. Transcription factors usually contain one or 
more regulatory domains. However, note that some constructs can use DBDs covaientiy 
attached to polymerase subimits. 

25 The term "regulatory domain" refers to any domain which regulates transcription, 

and includes both activation and repression domains. The term "activation domain'* denotes 
a domain in a transcription factor which positively regulates (increases) the rate of gene 
transcription. The term '^repression domain" denotes a domain in a transcription factor 
which negatively regulates (inhibits or decreases) the rate of gene transcription. 

30 The term 'transcriptional activator^' as used herein refers to a protein or protein 

complex which is capable of activating expression of a gene. Thus, as used herein, a 
transcriptional activator can be a single protein or altematively it can be composed of 
several units at least some of which are not covaientiy linked to each other. A 
transcriptional activator typically has a modular structure, i.e., comprises various domains, 

35 such as a DNA binding domain, and one or more transcriptional activation tags. 

The term "cofactof' which is used interchangeably herein with the terms "co- 
activatof "adaptor** and "mediator** refers to proteins which either enhance or repress 
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5 transcription in a non-gene specific manner, e.g., which lack intrinsic DNA binding 
specificity. Thus, cofactors are general effectors. Positively acting cofactors do not 
stimulate basal transcription, but enhance the response to an activator. 

A "dimerization domain" is defined as a domain that induces formation of dimers 
between two proteins having that domain, while a **tetramerization domain" is defined as a 
10 domain that induces formation of tetramers amongst proteins containing the tetramerization 
domain. An "oligomerization domain", generic for both dimerization and tetramerization 
domains, facilitates formation of oligomers, which can be of any subunit stoiechiometry (of 
course greater than one). 

The term "interact" as used herein is meant to include detectable interactions 
15 between molecules, hiteractions may be, for exa]iq)le, protein-protein,protein-nucleic acid, 
drug-protein, or drug-nucleic acid. 

By "covalently bonded" it is meant that two domains are joined by covalent bonds, 
directly or indirectly. That is, the "covalently bonded" proteins or protein moieties may be 
immediately contiguous or may be separated by stretches of one or more amino acids 
20 within the same fusion protein. 

By "altering the expression of the reporter gene" is meant a statistically significant 
increase or decrease in the expression of the reporter gene to the extent required for 
detection of a change in the assay being employed. It will be appreciated that the degree of 
change will vary depending upon the type of reporter gene construct or reporter gene 
25 expression assay being employed, as between FACS sorting and growth selection. 

The terms "fluorescently active" and "fluorescent label" refer to the ability to emit 
radiation of a given wavelength as a result of excitement with radiation of a different 
wavelength than that emitted. Typically, fluorescent reporter groups are detected by 
exciting the reporter group with a higher energy light and then detecting the emission of 
30 some of the absorbed energy as a lower energy light The term is also mtended herein to 
cover chemiluminescent, phosphorescent as well as fluorescent materials. The exciting 
radiation is conventionally ultraviolet or visible ligjit but may be infrared or other 
electromagnetic radiation. 

As used herein, the term "fluorophore" is inclusive of fluorophore and fluorescent 
35 compounds known to be useful in flow cytometry Preferably, the fluorophore is 

phycoerythrin (PE) or fluoresceinisothiocyanate (FTTC), but other useful fluorophores are 
known in the art. 
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5 The terms "interactors", "interacting proteins" and "candidate interactors" are used 

interchangeably herein and refer to a set of proteins which are able to form complexes with 
one another, preferably non-covalent complexes. 

By '*test protein" or *test polypeptide" is meant all or a portion of one of a pair of 
interacting proteins provided as part of the bait or prey fusion proteins. 

1 0 By "randomly generated" is meant sequences having no predetermined sequence; 

this is contrasted with "intentionally designed" sequences which have a DNA or protein 
sequence or motif determined prior to their synthesis. 

The terms "directed evolution" and "creation by directed evolution" mean 
bringing forth a sequence not found in nature which, e.g., encodes a novel molecule or 
1 5 DBD binding domain by mutating or randomizing genes and then iinposingrationally 

designed selection conditions and pressures. This may proceed through several cycles with 
increasingly stringent selection/screening criteria. 

The term "mutagenesis" refers to techniques for the creation of heterogeneous 
population of genes, e.g., by irradiation, chemical treatment, low fidelity replication, etc. 

20 By "amplification" or "clonal amplification" is meant a process whereby the density 

of host cells having a given phenotype is increased. 

The terms "pool" of polypeptides, '"polypeptide library" or "combinatorial 
polypeptide hTjrary" are used interchangeably herein to indicate a variegated ensemble of 
polypeptide sequences, where the diversity of the library may result firom cloning or be 
25 generated by mutagenesis or randomization. The terms **pool" of genes , "gene library** or 
"combinatorial gene library*' have a similar meaning, indicating a variegated ensemble of 
nucleic acids. 

By "screening* is meant a process whereby a gene library is surveyed to determine 
whether there exists within this population one or more genes which encode a polypeptide 
30 having a particular binding characteristic(s) in the interaction trap assay. 

By "selection" is meant a process whereby candidates fi-om a library are expressed 
in specialized cells, and these cells are subjected to growth conditions (selective conditions) 
under which only those cells in which expression of a reporter gene is measurably altered 
will survive or grow. 

35 The term "breakthrough false positive" or '^background false positive'* refers to host 

cells in which expression of the reporter gene occurs, e.g., by at least a statistically 
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5 significant amount, in a manner which is indqjendent on the interaction of the bait and prey 
proteins (in the case of a two hybrid assay) and the bait and DNA target sequence (in the 
case of a one hybrid assay). 

The term "zinc finger protein" or "ZFPs" or "zinc finger polypeptide" refers to 
proteins that bind to DNA, KNA and/or protein, in a sequence-specific manner, by virtue of 

1 0 a metal stabilized domain known as a zinc finger. See, for example. Miller et al ( 1 985) 
7.4:1609-1614; Rhodes a/. (1993) .Jcz. ^wer. Feb:56-65; and Klug (1999)7. 
Mol Biol 293:215-218. The most widely represented class of ZFPs, known as the C2H2 
ZFPs, comprises proteins that are composed of zinc fingers that contain two conserved 
cysteine residues and two conserved histidine residues. Over 10,000 C2H2 zinc fingers 

15 have been identified in several thousand known or putative transcription factors. Each 
C2H2 zinc finger domain comprises a conserved sequence of approximately 30 amino acids 
that contains the invariant cysteines and histidines in the following arrangement: -Cys-(X)2- 
4-Cys-(X)i2-His-(X)3-5-His (SEQ ID NO: 1). In animal genomes, polynucleotide sequences 
encoding this conserved amino acid sequence motif are usually found as a series of tandem 

20 duplications, leading to the formation of multi-finger domains within a particular 

transcription factor. As used herein, "zinc finger protein" refers to known zinc finger 
proteins, or firagments thereof or to novel polypeptides isolated by the methods of the 
invention. 

The terms "phage vector" and **phagemid" are art-recognized and generally refer to 
25 a vector derived by modification of a phage genome, containing an origin of replication for 
a bacteriophage, and preferably, though optional, an origin (on) for a bacterial plasmid. In 
certain embodiments, a library of replicable phage vectors, especially phagemids (as 
defined herein), encoding a library of fiision proteins and/or reporter gene constructs, is 
generated and used to transform suitable host cells. 

30 The term "helper phage" refers to a phage which is impaired or defective in its 

abihty to replicate. The defect can be one which results fi"om removal, mutation, or 
inactivation of phage genomic sequence required for phage replication. Helper phage can 
be used to infect cells harboring a phagemid resulting in the production of infectious phage 
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5 particles primarily harboring single-standed DNA forms of the phagemid. Examples of 
helper phage include M13K07, VCS-M13, M13 derivatives, and fl derivatives. 

The phrase "varying the growth conditions of the host cell," or the like, refers to 
changing or modifying any environmental factor which may affect the growth of a cell, 
including, for example, changing the composition of the growth medium, adding a drug to 
10 the growth medium, changing the temperature at which the cells are grown, changing the 
agitation rate to which the cells are exposed, changing the length of time the cells are 
grown, changing the amount of light to which the cells are exposed, changing the amount of 
CO2 and/or O2 to which the cells are exposed, etc. 

The term "desired expression level," or the like, refers to the level of expression of a 
15 reporter gene which produces a use&l means for selecting of a population of cells 
con^rising a test polypeptide that may or may not interact with at least one other 
polypeptide or at least one nucleic acid (DNA or RNA) sequence. In various embodiments, 
a desired expression level refers to an increase, a decrease, or no change in the level of the 
reporter gene as compared to the basal level of expression of the reporter gene. In other 
20 embodiments, a desired expression level refers to an increase, a decrease, or no change in 
the level of the reporter gene upon q>plication of an external factor as compared to the level 
of expression of the reporter gene before application of the external factor. The external 
factor can be anything which varies the growth conditions of the cell, as described herein, 
and in a particular embodiment refers to contacting tlie host cell with a test agent. 

25 The term '^anslational element" refers to any nucleic acid sequence which is 

sufficient to permit translation of an RNA sequence into a polypeptide. In certain 
embodiments, the translational element refers only to a start codon (ATG), whereas in other 
embodiments, it refers to a sequence comprising a start codon^ ribosome binding sites, etc. 

The phrase "analyzed by FACS," or the like, as tised herein, is meant to include 
30 monitoring and/or sorting of a population of cells using FACS. 

The terms "agenf ' or *test agent" are used herein interchangeably and are meant to 
include, but are not limited to, peptides, nucleic acids, carbohydrates, small organic 
molecules, natural product extracts, and libraries thereof. 
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5 The term "agonize"' as used herein, refers to an augmentation of the formation of a 

protein-protein or protein-DNA complex, wherein augmentation may mean an increase in 
the amount of, or the increase ia the duration o£ a complex. 

The term "antagonize", as used herein, refers to an inhibition of the formation of a 
protein-protein or protein-DNA complex, wherein inhibition may mean a decrease in the 
10 amount or duration of a complex, tive site. 

The term "tunable" or "tunable selection" refers to the ability to control the degree 
of growth advantage conferred by a reporter gene being expressed m a cell by varying the 
growth conditions of the cell. 

The term "imp" strain" refers to a strain of bacteria containing a mutation in the 
15 increased membrane permeability locus leading to increased permeability of the outer 
membrane of the cell (Sampson et al., Genetics 122(3): 491-501 (1989)). 

The term "differentially interact," or the like, refers to the ability of a first molecule 
(a polypeptide, nucleic acid, or small molecule) to interact with at least two other test 
molecules (polypeptides, nucleic acids, or small molecules). In various embodiments, a 
20 first molecule will differentially interact with two other test molecules wherein it (i) 
interacts strongly with both test molecules, (ii) interacts strongly with one of the test 
molecules and weakly with the other test molecule, or (iii) interacts weakly with both test 
molecules. 

The term "diflferentially modulates," or the like, as used herem, refers to the ability 
25 of a test agent to affect the interaction of a first molecule (a polypeptide, nucleic acid, or 
small molecule) with at least two other test molecules (polypeptides, nucleic acids, or small 
molecules). In various embodiments, a test agent will differentially modulate the 
interaction of a first molecule with two other test molecules wherein it (i) strongly affects 
the interaction of the first molecule wifli both test molecules, (ii) strongly affects the 
30 interaction of the first molecule with one of the test molecules and weakly affects the 
interaction of the first molecule with the other test molecule, or (iii) weakly affects the 
interaction of the first molecule with both test molecules. 
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5 The term "interacts to a desired extent," or the like, refers to an interaction between 

molecules (polypeptide-polypeptide or polypeptide-nucleic acid) which results in a desired 
level of expression of a reporter gene, in accord with the methods of the invention. A 
desired extent of interaction may be a strong interaction between two molecules, a weak 
interaction between two molecules, or no interaction between two molecules. Additionally, 

10 a desired extent of interaction may result in an increase, a decrease, or no change in the 
level of expression of the reporter gene as compared to the basal level of expression of the 
reporter gene in accord with the various embodiments of the invention. 

The term **basal expression level" refers to the level of expression that occurs in the 
absence of a productive interaction between two polypeptides or a polypeptide and a DNA 
15 sequence. 



III. Exemplary Embodiments for ITS Reagents 

Before describing the various embodiments of the subject interaction trap assays, we 
first provide a generic description of the "bait" and "prey** proteins and rqporter gene 
20 constructs used in the various assays formats. It is noted that the following description of 
particular arrangements of test polypeptide sequences in terms of being part of the bait or 
prey fusion proteins is, in general, arbitrary. As will be apparent firom the description, the 
test polypeptide portions of any given pair of interacting bait and prey fusion proteins may, 
in certain embodiments, be swapped with each other. 

25 

A. Bait protein constructs for two hybrid format 

One of the first steps in the use of the interaction trap system of the present 
invention is to construct the bait fusion protein. Sequences encoding a first interacting 
domain are cloned in-fi:ame to a sequence encoding, depending on the embodiment, a 
30 known or potential (test) DNA binding domain (DBD), e.g., a polypeptide which may 
specifically bind to a defined nucleotide sequence of a reporter gene construct. A basic 
requirement for the bait fusion protein is that it alone causes little or no transcriptional 
activation of the reporter gene in the absence of an interacting prey fusion protein or DNA 
sequence. In addition, the DBD and interacting domain should not aJSect the activity of the 
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5 other. (However, when selecting DBDs or their binding sites from a variegated Kbrary, the 
DBD may be fused directly to the activation domain or the polymerase subimit) 

B, Prey protein constructs for two hybrid format 

The subject assay also utilizes a chimeric prey protein. In preferred embodiments, 
10 the prey fusion protein comprises: (1) a second interacting domain, capable of forming an 
intermolecular association with the first interacting domain of the bait polypeptide, and (2) 
an activation tag, such as a polymerase interacting domain or a polymerase subimit As 
described above, protein-protein contact between the bait and prey fusion proteins (via the 
interacting domains) links the DNA-binding domain of the bait fusion protein with the 
15 polymerase interaction domain (or a polymerase subunit) of the prey fusion protein, 

generating a protein complex capable of directly recruiting a functional RNA polymerase 
enzyme to promoter sequences proximal to the DNA-bound bait protein, i.e., activating 
transcription of the reporter gene. 

DNA dependent RNA polymerase in E, coli and other bacteria consists of an 
20 enzymatic core composed of subunits a, p, and P' in the stoichiometry a2PP% and one of 

several alternative a factors responsible for specific promoter recognition. In one 
embodiment, the prey fusion protein includes a sufficient portion of the amino-terminal 
domain of the a subunit to permit assembly of transcriptionally active RNA polymerase 
complexes which include the prey fusion protein. The a subunit, which initiates the 
25 assembly of RNA polymerase by forming a dimer, has two independently folded domains 
(Ebright et al. (1995) Curr Opin Genet Dev 5:197). The larger amino-terminal domain (a- 
NTD) mediates dimerization and the subsequent assembly of the polymerase complex. The 
prey polypeptide can be fused in frame to the a-NTD, or a fragment or mutant thereof, 
which retains the ability to assemble a functional RNA polymerase complex. 

30 The present invention also contemplates the use of polymerase interaction domains 

containing portions of other RNA polymerase subunits or portions of molecules which 
associate with an RNA polymerase subunit or subunits. Contemporary models of the 
polymerase complex predict a substantial degree of intramolecular motion within the 
transcription complex. Movement of parts of the enzyme complex relative to each other is 

35 believed to be realized by stracturaUy independent domains, such as the N-terrainal and C- 
temxinal domains of the a subunit described above. Accordingly, it is possible that the 
paradigm of transcriptional activation realized with fiision proteins incorporating only a 
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5 portion of the subunit is also applicable to fusion proteins generated with portions of other 
polymerase subunits, e.g., with portions of the p, P', co and/or a subunits. The use of 
portions of such other subunits to generate a prey fusion protein are, hke the a-NTD 
example above, use&l if they provide fusion proteins which retain the ability to form active 
polymerase complexes. For example, Severinov et al. (1995) PNAS 92:4591 describes the 
10 ability of fragments of the p subunit (encoded by the E coU rpoB gene) to reconstitute a 
functional polymerase enzyme. It is noted that it may be a formal requirement of 
embodiments utilizing prey fiision proteins including PIDS of the p, P', co and/or o subunits 
that other fragments of the subunit be provided, e.g., co-expressed, in the host cell. See also. 
Dove et al. (1997) Nature 386:627. 

15 Additionally, given the general conservation of the polymerase subunits amongst 

bacteria, the present invention also specifically contemplates prey fiision proteins derived 
with polymerase interaction domains of RNA polymerase subunits from other bacteria, e.g., 
Staphylococais aureus (Deora et al. (1995) BiochemBiophvs Res Commun 208:610), 
Bacillus subtilis, etc. 

20 In an alternative embodiment, instead of a polymerase interaction domain, the prey 

fusion protein can include an activation domain of a transcriptional activator protein. The 
bait fusion protem, by forming DNA boimd complexes with the prey fusion protein, can 
indirectly recruit RNA polymerase complexes to the promoter sequences of the reporter 
gene, thus activating transcription of the reporter gene. To illustrate, the activation domain 

25 can be derived from such transcription factors as PhoB or OmpR. The critical consideration 
in the choice of the activation domain is its ability to interact with RNA polymerase 
subunits or complexes in the host cell in such a way as to be able to activate transcription of 
the reporter gene. 

C. Bait protein constructs for one hybrid format 

30 In certain embodiments of the subject invention, the interaction trap assay is 

designed to detect interaction between a potential DNA binding domain and a potential 
DBD recognition element. In those embodiments, it is not necessary that the transcriptional 
activiation activity be separated from the bait protein into the prey protein, as it is in the two 
hybrid format. Thus, in a one hybrid format, sequences encoding a known or potential 

35 (test) DNA binding domain (DBD), e.g., a polypeptide which may specifically bind to a 
defined nucleotide sequence of a reporter gene constmct frised in frame to an activation 
domain, such as a PID. As above, the basic requirement for the bait ftision protein is that it 
alone causes little or no transcriptional activation of the reporter gene in the absence of 
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5 interaction with the DBD recognition sequence of the reporter gene. In addition, the DBD 
and activation domain should not a£fect the activity of the other. 

D, Reporter gene constructs 

The level of reporter gene expression ultimately measures the end stage of the above 
10 described cascade of events, e.g., transcriptional modulation, and permits the isolation 
and/or amplification of ITS host cells in a manner dependent on the interaction of the bait 
and prey proteins and the transcriptional regulatory element of the reporter gene. 
Accordingly, in practicing one embodiment of the assay, a reporter gene construct is 
inserted into the reagent cell. Typically, the reporter gene construct will include one or 
15 more reporter genes in operative linkage with one or more transcriptional regulatory 

elements which include, or are linked to, at least one known or potential DBD recognition 
element for the DBD of the bait fusion protein. Iq various embodiments, the reporter gene 
constmct may contain at least one, two, three, four, or five known or potential DBD 
recognition elements. Based on the teachings described herein, those of skill in the art 
20 could readily identify or synthesize reporter genes and transcriptional regulatory elements 
useful in the subject methods. (When testing specificity, one also may have reporters with 
binding sites that you would prefer the protein not recognize.) Further detail is provided 
below. 

25 IV. Exemplary Embodiments for Analysis of Large Libraries by Growth Selection 

We have discovered that use of selectable reporter genes which confer a growth 
advantage to a prokaryotic host cell, rather than merely a visual selection marker allows the 
interaction trap assay to be used to screen libraries of potential protein-protein or protein- 
DNA interactors exceeding 10^ members. In the prior art systems, lack of stringency can 
30 result in isolation of a significant population of non-specific interacting pairs, i.e., false 
positives. la large libraries, a high percentage of false positives can make the isolation and 
identification of true interactors from a large library time consuming, if not impossible. 

In the ITS formats of the subject invention, we have shown that the use of reporter 
genes providing a highly stringent amplification profile can in fact reduce the number of 
35 false positives, especially breakthrough false positives, bemg amplified to the point that 
large scale library screening is in fact feasible. Thus, the subject assay can be set up to 
utilize a reporter gene system that reduces the nmnber of false positive interactions to less 
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5 than 50% of an enriched library, and more preferably less than 25 percent, or even 10, 5 or 
1 perc^t. In a preferred embodiment, the assay reduces the occurrence rate of breakthrough 
false positives to less tiian 1:10^, and even more preferably less than 1 :10^, 1 :10^ or even 
1:10^^ 

In this embodiment of the present invention, the reporter gene is chosen on the basis 
10 of its ability to facilitate isolation and/or amplification of US cells on the basis of a 

selective growth advantage, e.g., the ability to grow, and preferably can provide a highly 
stringent amplification proi51e which reduces the number of false positives being amplified. 
Accordingly, m practicing one embodiment of the assay, a reporter gene construct is 
inserted into the reagent cell in order to generate a selectable growth advantage dependent 
15 on interaction of the bait and prey fUsion proteins vnfh each other and the regulatory 

elements of the reporter gene. Typically, the reporter gene construct will include a reporter 
gene in operative linkage with one or more transcriptional regulatory elements which 
include, or are linked to, a potential DBD recognition element for the DBD of the bait 
fusion protein, with the level of expression of the reporter gene providing the prey protein 
20 interaction-dependent growth advantage (or the DBD-DNA interaction when selectmg for 
DNA binding). 

Based on the teachings described herein, those of skill in the art could readily 
identify or synthesize reporter genes and transcriptional regulatory elements usefiil in the 
subject methods. In general, the reporter gene is selected to provide a selection method 

25 such that cells in which the reporter gene is activated have a growth advantage. For 
exanyjle the reporter could enhance cell viability, e.g., by relieving a cell nutritional 
requirement, and/or provide resistance to a drug. To fiirther illustrate, examples of suitable 
reporter genes include those which encode proteins conferring antibiotic resistance to the 
host bacterial cell, though more preferably are a gene which encodes a protein required to 

30 complement an auxotrophic phenotype. A preferred reporter gene is the HIS3 gene, which 
permits £ coli cells bearing a deletion of the hisB gene to grow in Hie absence of histidine. 
3-AT, a competitive inhibitor of HISS, can be used to increase the level of HIS3 expression 
required for growth in the absence of histidine. Thus, 3 AT can be used to increase the 
stringency of the selection. 

35 In bacteria, suitable positively selectable (beneficial) genes include genes involved 

in biosynthesis or drug resistance. Countless genes are potential selective markers- Certain 
of the above are involved in well-characterized biosynthetic pathways. In the simplest case, 
the cell is auxotrophic for an amino acid or nucleotide precursor, such as histidine, uracil. 
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5 leucine, tryptophane or adenine, in the absence of activation of the reporter gene. 

Auxotrophy means the inability of the micro-organism to synthesise certain growth factors, 
for example amino acids, from simple precursors. In contrast to the corresponding wild type 
strains, auxotrophic mutants therefore do not grow on minimal medium. On the contrary, 
they require a complete medium or minimal medium supplemented with components 

10 necessary for growth which they cannot synthesize themselves. Activation of the ITS leads 
to synthesis of an enzyme, encoded by the reporter gene, required for biosynthesis of the 
amino acid and the cell becomes prototrophic for that amino acid (does not require an 
exogenous source). Thus the selection is for growth in the absence of that amino acid in the 
culture media. 

15 To further illustrate, we have discovered that, surprisingly, the HIS3 reporter gene 

can be used to rescue a prokaryotic host cell in HIS selective media with sufificient 
stringency to successfully isolate interacting pairs from a large library of variants. Lack of 
stringency in other systems can result in isolation of a significant population of non-specific 
interacting pairs, i.e., false positives. In large libraries, a high percentage of false positives 

20 can make &e isolation and identilScation of true interactors time consuming, if not 
impossible. In the case of the HISS reporter, the use of 3-amino-triazole (3 AT), a 
competitive inhibitor of HIS3, selects for cells in which the HISS reporter is highly 
expressed (i.e., increases the stringency of the selection), and thereby lowers the number of 
false positives due to breakthrough in the enriched product. Using diiSerent levels of 3-AT 

25 allows "tuning'* or the selection stringency. 

Another exemplary reporter gene which may be used in the subject assay is the P- 
lactamase system. P-lactams are antibiotics which act by interfering with cell wall 
biosynthesis in the bacteria resulting in impaired cellular function, altered cell morphology 
or lysis. Bacteria have developed the abihty to resist p-lactam activity through the 

30 production of p-lactamases which are enzymes that irreversibly hydrolyze the amide bond 
of the P-lactam ring thus rendering the antibiotic inactive. A specific exanq>le of a p- 
lactamase enzyme is taught by Stemmer {Nature 1994 Aug 4;370(6488):389) which 
provides a variant of TEM-1 which is more resistant to cefotaxime, e.g., has a higher 
minimum inhibitory concentration. Recently, various compounds capable of inhibiting P- 

35 lactamase activity have been developed thus permitting antibiotic growth selection of 

various bacterial strains even in the presence of p-lactamases. This system also provides a 
tunable selection method. A bacterial cell expressing a P-lactamase enzyme as the reporter 
gene can be grown in the presence of a constant level of p-lactam antibiotic and a variable 
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5 concentration of P-lactamase inhibitor. Control of the level of p-lactamase inhibitor 

permits control of the stringency of the growth conditions - a high concentration of inhibitor 
results in more stringent growth conditions whereas a low concentration of inhibitor resuhs 
in less stringent growth conditions. The gene encoding for the P-lactamase enzyme maybe 
introduced into the bacteria such that it is constitutively or regulatably expressed. See for 

10 example, Liras et al., Appl. Microbiol. BiotechnoL 54(4): 467-475 (2000); Saves et al, J. 
biol. Chem. 270(31): 18240-18245 (1995); Thomson et al., J. Anthnicrob. Chemother. 
31(5): 655-64 (1993); Maddux, Pharmacother^y ll(2(pt 2)): 40S-50S (1991); Selzer et al, 
Nat. Struct. Biol. 7(7): 537-41 (2000); Huang et al., J. Biol. Chem. 275(20): 14964-8 
(2000); Shaywitz et al., Mol. Cell Biol. 20(24): 9409-9422 (2000). 

15 Any combination of P-lactamase, P-lactam antiobiotic and p-lactamase inhibitor 

may be used in conjunction with the tunable selection system. Exemplary p-lactamase 
enzymes include TEM-1, TEM-2, OXA-1, OXA-2, OXA-3, SHV-1, PSE-1, PSE-2, PSE-3, 
PSE-4 and CTX-1. Exemplary p-lactam antibiotics include penicillins, cephalosporins, 
monobactams and carbapenems. Exemplary p-lactamase inhibitors include clavulanic acid, 

20 sulbactam, tazobactam, brobactam, P-lactamase inhibitor peptides (BUP) and various 
mutants of BLIP. Examples of particular combinations of P-Iactam antibiotics and b- 
lactamase inhibitors which have been used include ticarcillin and clavulanate, amoxicillin 
and clavulanate and ampicillm and sulbactam. 

Thus, in preferred embodiments, the subject assay can be set up to utilize a reporter 
25 gene system which provides sufficient stringency for detecting interactions such that the 
number of false positive interactions is less than 50% of an enriched library, and more 
preferably less than 25 percent, or even 10, 5 or 1 percent 

V. Exemplary embodiments for flow-ITS 

30 Another aspect of the present invention provides methods and reagents for 

practicing various forms of interaction trap assays using flow cytometry, preferably as a 
high throughput means (supra) . The subject "flow ITS" can be used, for example, to screen 
libraries of potential protein-protein or protein-nucleic acid interactions. In preferred 
embodiments, the subject ITS system can be used to screen libraries of potential interactors 

35 exceeding 10^ members. See Daugherty et aL, J, hnmun. Methods 243: 211-227 (2000) for 
a review on screening of cell-based Ubraries using flow cytometry. 
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5 The reporter gene(s) used in this embodiment of the invention ultimately measure 

the end stage of the above described cascade of events, e.g., transcriptional modulation, 
with the level of expression of a product(s) which is fluorescently active. The reporter gene 
of the flow-ITS can be any gene that expresses a FACS detectable gene product, which may 
be RNA or protein. 

10 There are at least two basic designs for the flow-ITS. In a "direct detection system" 

the reporter gene encodes a product which is readily detectable by flow cytometry due to its 
own fluorescence activity (a "direct FACS tag^')- ^ alternative, the flow-ITS is derived 
as an "indirect detection system" wherein the reporter gene product is detected by FACS 
upon combination with a fluorescently active agent which specifically binds to and/or is 

15 modified by the reporter gene product. Thus, the reporter gene may encode a "direct FACS 
tag", e.g., a fluorescent polypeptide or a polypeptide which may generate a fluorescent 
signal by enzymatic action, or an "indirect FACS tag", e.g., a polypeptide which binds 
and/or modifies a fluorescently active molecule to generate a fluorescent signal. 
Chemiiuminescent reporter groups, which are for ease of reading referred to herein as 

20 fluorescent groups, are detected by allowing them to enter into a reaction, e.g., an 
enzymatic reaction, that results in energy in the form of light being emitted. 

The reporter gene may also be included in the construct in the form of a fusion gene 
with a gene that includes desired transcriptional regulatory sequences or exhibits other 
desirable properties. 

25 In one embodiment, the rq)orter gene encodes a fluorescently active polypeptide. 

Examples of such reporter genes include, but are not limited to firefly luciferase (deWet et 
al. (1987), Mol. Cell. Biol. 7:725-737); bacterial luciferase (Engebrecht and Silverman 
(1984), PNAS 1: 4154-4158; Baldwin et al. (1984), Biochemistrv 23: 3663-3667); 
phycobiliproteins (especially phycoerythrin); green fluorescent protein (GFP: see Valdivia 

30 et al. (1996) Mol Microbiol 2 2: 367-78; Cormack et al. (1996) Gene 173 (1 Spec No): 33-8; 
and Fey et al. (1995) Gene 165:127-130. Both the GFPs and the phycobiliproteins have 
made an important contribution in FACS sorting generally because of their high extinction 
coefficient and high quantum yield, and are accordingly preferred products of the reporter 
gene. 

35 A preferred embodiment utilizes a GFP which has been engineered to have a higher 

quantum yield (brighter) and/or altered excitation or emmision spectra relative to wild-type 
GFPs. In general, the fluorescence levels of intracellular wild-type GFP are not bright 
enougih for flow cytometry. However, a wide variety of engineered GFPs are known in the 
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5 art which show both improved brightness and signal-to-noise ratios. For instance, the 
subject reproter gene can encode a GFP-Bexl (S65T, V163A) or GFP-Vexl (S202F, 
T203I, V163A). See Anderson et al. (1996) Genetics 93:8508. Other modified GFPs are 
described, for example, in U.S. Patents 5,360,728 and 5,541,309 which describe modified 
forms of apoaequorin with increased biolimiinescence. 

10 In other embodiments, the reporter gene encodes an enzyme which, by acting on a 

substrate, produces a fluorescently active product. For instance, fluoroscein-di-p-D- 
galactopyranoside (FDG) is a usefiil substrate for a reporter gene encoding a p- 
galactosidase in detection by flow cytometry, particularly in gram negative bacteria. See 
Plovins et al. (1994) Apphed Envir Micro 60:4638; and Alvarez et al. (1993) Biotechniques 

15 15:974. 

In yet other embodiments, the reporter gene product is not itself sufficiently 
fluorescently active for FACS purposes. Rather, the reporter gene product is one which is 
able to bind to a molecule (or complex of molecules), referred to herein as a "secondary 
fluorescent tag*', which provides a fluorescenfly active moiety for detection by FACS. A 
20 preferred criteria for the selection of the reporter gene product in these embodiments is that 
the host cell, except for the reporter gene product, does not produce any other protein, etc., 
which binds to the secondary fluorescent tag at any appreciable level which would 
confound the FACS sorting of the ITS cells. 

In preferred embodiments of the indirect detection system, the reporter gene 
25 encodes a protein which is associated with the cellular membrane and is at least partially 
e7q)osed to the extracellular milieu. For instance, the indirect FACS tag can be a 
transmembrane protein having an extracellular domain, or an extracellular protein with 
some other form of membrane localization signal which keeps the tag sequestered on the 
surface of the ITS ceU, e.g., such as a myristol, famesyl or other prenyl group. The indirect 
30 FACS tag can be a protein which is native to the host cell, but not normally expressed in the 
ITS cell either because of its strain or the conditions under which the ITS is run. In other 
embodiments, the indirect FACS tag is a protein which includes a portion that is non-native 
to the host cell, e.g., it is a naturally occurring polypeptide sequence fiom another species 
or it is man-made polypeptide sequence, and it is the heterologous portion of the fusion 
35 protein which is bound by the secondary fluorescent tag. 

In an illustrative embodiment, the indirect FACS tag is a fusion protein including a 
polypq)tide portion which is not native to the host cell. Recombinant proteins are able to 
cross bacterial membranes after the addition of bacterial leader sequences to the N-tenninus 
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5 of the protein OBetter et al (1988) Science 240:1041-1043; and Skerra et al. (1988) Science 
240: 1038-1041). In addition, recombinant proteins have been fused to outer membrane 
proteins for surface presentation. For example, one strategy for displaying exogenous 
proteins on bacterial cells comprises generating a fusion protein by inserting the exogenous 
protein into cell surface exposed portions of an integral outer membrane protein (Fuchs et 

10 al. (1991) Bio/Technology 9:1370-1372). 

In selecting a bacterial cell which can display such radirect FACS tags, any 
well-characterized bacterial strain will typically be suitable, provided the bacteria may be 
grown in culture, and engineered to display the reporter gene product on its surface. 
Among bacterial cells, the preferred display systems include Salmonella typhirnuriuniy 

15 Bacillus subtilis^ Pseudomonas aeruginosa. Vibrio cholerae^ Klebsiella pneumonia^ 

Neisseria gonorrhoeae^ Neisseria meningitidis y Bacteroides nodosus, Moraxella boviSy and 
especially Escherichia coli. Many bacterial cell surface proteins useful in the present 
invention have been characterized, and works on the localization of these proteins and the 
methods of determining then- structure include Benz et al. (1988 ) Ann Rev Microbiol 42: 

20 359-393; Balduyck et al. (1985) Biol Chem Honne-Sevler 366:9-14; Ehrmann et al (1990) 
PNAS 87:7574-7578; Heijne et al. (1990) Protein Engineering 4:109-112; Ladner et al. 
U.S. Patent No. 5,223,409; Ladner et al. WO88/06630; Fuchs et al. (1991) Bio/technologv 
9:1370-1372; and Goward et al. (1992) TIBS 18:136-140. 

To further illustrate, the LamB protein of £ coli is a well understood surface protein 
25 that can be used to generate the indirect FACS tag product of a reporter gene on the surface 
of a bacterial cell (see, for example, Ronco et al. (1990) Biochemie 72:183-189; van der 
Weit et al. (1990) Vaccme 8:269-277; Charabit et al. (1988) Gene 70:181-189; and Ladner 
U.S. Patent No. 5,222,409). LamB of coli is aporin for maltose and maltodextrin 
transport, and serves as the receptor for adsorption of bacteriophages X and KIO. LamB is 
30 transported to tiie outer membrane if a functional N-terminal signal sequence is present 
^enson et al. (1984) PNAS 81:3830-3834). As with other cell surface proteins, LamB is 
synthesized with a typical signal-sequence which is subsequently removed. Thus, the 
indirect FACS tag can be generated as a fusion gene of LamB, such that the resulting fiision 
protein comprises a portion of LamB sufficient to anchor the protein to the cell membrane 
35 with the indirect FACS tag fragment oriented on the extracellular side of flie membrane. 
Secretion of the extracellular portion of the fusion protein can be facilitated by inclusion of 
the LamB signal sequence, or other suitable signal sequence, as the N-terminus of the 
protein. 
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5 The E. coli LamB has also been expressed in functional fonn in S, typhimurium 

(Harldd et al. (1987) Mol Gen Genet 209:607-61 1), V, cholerae (Harkki et al, (1986) 
Microb Pathol 1:283-288), mi K. pneumonia (Wehmeier et al. (1989) Mol Gen Genet 
215:529-536), so that one could display an mdirect FACS tag in any of these species as a 
fusion to E. coli LamB. Alternatively, the LaraB protein itself can serve as the indirect 
10 FACS tag. 

Moreover, K. pneumonia expresses a maltoporin similar to LamB which could also 
be used. In P. aeruginosa, the Dl protem (a homologue of LamB) can be used (Trias et al. 
(1988) Biochem Bionhvs Acta 938:493-496). Similarly, other bacterial surface proteins, 
such as PAL, OmpA, OmpC, OmpF, OprF, Lpp-OmpA, PhoE, pilin, BtuB, FepA, VirG, 
15 Flic, FUC, Type I pili, Pap pili, FhuA, lutA, Fee A and FhuE, may be used in place of 
LamB to generate the indirect FACS tag, e.g., in a bacterial cell. For a general review, see 
Georgion et al. (1997) Nature Biotech 15:29. Cell surface proteins such as OmpA, OmpF, 
OmpC are present at greater than 10^ molecules/cell, often as much as 10^ molecules/cell, 
which can provide good signal-to-noise ratios in FACS. 

20 Those skilled in the art will also readily recognize surface polypeptides in 

eukaryotic cells which can suitably serve as indirect FACS tags. For instance, the indirect 
FACS tag can be a subunit of the yeast agglutin, such as AGal or AGA2. See for example 
Schreuber et al. (1993) Yeast 9:399. Another useful surface protein for use as an indirect 
FACS tag is the IL-8 receptor from mammalian cells. 

25 Where the flow-ITS utilizes an indirect FACS tag, a secondary fluorescent tag must 

be provided in order to label flie cells of FACS. The secondary fluorescent tag can be a 
fluorescently-labeled antibody or other binding moiety which specifically binds to the 
indirect FACS tag on the surface of the ITS cell. Where the indirect FACS tag is a 
receptor, or at least Ugand binding domain thereof, the secondary fluorescent tags can also 

30 be a fluorescently-labeled ligand of the receptor. Such Ugands can be polypeptides or small 
molecules- 

In general, for use in flow cytometry, the fluorescently active tag should preferably 
have the following characteristics: 

(i) the molecules of the secondary fluorecent tag must be of sufficient size and 
35 chemical reactivity to be conjugated to a suitable fluorescent dye or the 

secondary fluorecent tag must itself be fluorescent. 
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5 (ii) after any necessary fluorescent labeling, the secondary fluorecent tag 

preferably does not react with water, 

(iii) after any necessary fluorescent labeUng, the secondary fluorecent tag 
preferably does not bind or degrade proteins in a non-specific way, and 

(iv) the molecules of the secondary fluorecent tag must be sufficiently large that 
10 attaching a suitable dye allows enough unaltered surface area (generally at 

least 500A , excluding the atom that is connected to the linker) for binding 
to the indirect FACS tag on the ITS ceU. 

Fluorescent groups with which the process of this invention can be used include fluorescein 
derivatives (such as fluorescein isothiocyanate), coumarin derivatives (such as aminomethyl 
15 coumarin), rhodamine derivatives (such as tetramethyl rhodamine or Texas Red), peridinin 
chlorophyll complex (such as described in U.S. Pat. No. 4,876,190), and phycobiliproteins 
(especially phycoerythrin). 

In one preferred embodiment of the process, when the reporter group is fluorescein, 
detection of the ITS cells by FACS is achieved by measuring light emitted at wavelengths 
20 between about 520 nra and 560 nm (especially at about 520 nm), most preferably where the 
excitation wavelengths is about or less than 520 nm. 

Chemiluminescent groups with which Ihe subject secondary fluorescent tags can be 
generated include isoluminol (or 4-aininophthalhydrazide). 

In other instances, the reporter gene can encode a nucleic acid which can be detected 
25 by flow cytometry upon interaction with a FACS label, hi one embodiment, the reporter 
gene can "encode*' a ribozyme, and detection of fluorescenfly active nucleic acid firagments 
can be detected for flow sorting upon addition of an ^propriately labeled substrate for the 
ribozyme. For instance, the substrate nucleic acid can include a fluorogenic donor radical, 
e.g., a fluorescence emitting radical, and an acceptor radical, e.g., an aromatic radical which 
30 absorbs the fluorescence energy of the fluorogenic donor radical when the acceptor radical 
and the fluorogenic donor radical are covalenfly held in close proximity. See, for example, 
USSN 5,527,681, 5,506,115, 5,429,766, 5,424,186, and 5,316,691; and Capobianco et al. 
(1992) Anal Biochem 204:96-102. For example, the substrate nucleic acid has a 
fluorescence donor group such as 1-aminobenzoic acid (anthranilic acid or ABZ) or 
35 aminomethylcoumarin (AMC) located at one position on the plymer and a fluorescence 

quencher group, such as lucifer yellow, methyl red or nitrobenzo-2-oxo-l,3-diazole (NBD), 
at a different position. A cleavage site for the ribozyme will be diposed between each of 
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the sites for the donor and acceptor groups. The intramolecular resonance energy ^ansfer 
from the fluorescence donor molecule to the quencher will quench the fluorescence of the 
donor molecule when the two are sufficiently proxunate in space, e.g., when the substrate i& 
intact. Upon cleavage of the substrate, however, the quencher is separated from the donor 
group, leaving behind a fluorescent fragment. Thus, expression of the ribozyme results hx 
10 cleavage of the substrate nucleic acid, and dequenching of the fluorescent group. Similar 
embodiments can be generated for peptide-based substrates of enzymes, 

Li certain embodiments, the flow-ITS can be designed to detect proteins which, 
disrupt the interactibn of two proteins. For instance, cDNA libraries can be screened tor 
products which disrapt the binding of such protein pairs as cyclins and cyclin-depen<iervt 

15 kinases. To fiirther illustrate, the bait and prey proteins can be generated using kno>A/xi 
interactors. The cDNA library can be expressed as a third recombinant protein. L^ss of 
expression of the rq)orter gene indicates the expression of gene encoding a protein v^bich 
disrupts the interaction of the bait and prey proteins. Such loss would register, id th^ flow- 
ITS, as loss of a fluorescent signal in the FACS. M order to avoid potentially coiifQjunding 

20 results of such embodiments, the flow-ITS format can be modified slightly to provide a 
'^reverse flow-ITS". In the reverse ITS, the reporter gene encodes a transcriptSoixcal 
repressor which is expressed upon interaction of the bait and prey proteins. Hoavever ttie 
host cell also includes a second reporter gene which, but for an operator seqxiei^ce 
responsive to the repressor protein produced by the first reporter gene, wouZld CDtherwise be 

25 expressed as a FACS tag detectable in the FACS step of the present method. Thus the 
gene product of the first reporter gene regulates expression of the second "rep^orter gene the 
expression of the latter provides a means for indirectly scoring by FACS. aiualysis for the 
expression of the former. Essentially, the first r^orter gene can be seexi a signal 
inverter. 

30 In this exemplary system, the bait and prey proteins positively- xe^gulate expression 

of the first reporter gene. Accordingly, where the first reporter gene is ^ repressor of 
expression of the second reporter gene, relieving expression of the first: reporter gene by 
inhibiting the fonnation of complexes between the bait and prey protelxis concomitantly 
relieves inhibition ofthe second reporter gene. For example, the first reporter gene can 

35 include the codmg sequences for Xcl. The second reporter gene c-aa accordingly encode a 
direct or indirect FACS tag, and is under the control of a promoter N^vhich is constitutively 
active, but can be repressed by XcL In the absence of a polypeptid.^ which inhibits the 
interaction ofthe bait and prey protein, the Xcl protein is expresse^d. In turn, that protein 
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5 rq)resses expression of the second reporter gene. However, an agent which disrupts 
binding of the bait and prey proteins results in a decrease in X^cl expression, and 
consequently an increase in expression of the second reporter gene as Xcl repression is 
relieved Hence, the signal is inverted. 

Still another consideration in generating tlie reporter gene construct concerns the 
10 placement of the DBD recognition element relative to the reporter gene and other 
transcriptional elements with which it is associated. In most embodiments, it will be 
desirable to position the recognition element such that on its own it does not significantly 
activate transcription from the promoter. In some instances, the axial position of the DBD 
relative to the promoter sequences can be important. 

15 In certam embodiments, the sensitivity of the ITS can be enhanced for detecting 

weak protein-protein interactions by placing the DBD recognition sequence at a position 
permittuig secondary interactions (if any) between other portions of tlie bait fusion protein 
and the RNA polymerase complex. For example, an apparent synergistic effect was 
observed when the X operator was moved close to or at its normal position (Dove et al., 

20 supra) . While not wishing to be bound by any particular tlieory, this synergism is 

speculated to be the result of a bait-prey mteraction and second interaction between DBD of 
Xcl and a second polymerase subunit (a). 

It will also be understood by those skilled in the art that the sensitivity to the 
strength of the interactions between the bait and prey proteins can be *tuned" by adjusting 
25 the sequence of the recognition element. For example, the use of a strong X operator 

instead of weak can improve the sensitivity of the assay to weak bait-prey interactions, as 
well as help to overcome lack of dimerization if no dimerization signals are included in the 
bait fusion protein. 

The flow sorting cutoff, e.g., the strength of the fluorescent signal required for 
30 gating of cells through the sorter, can also be used to tune the system with respect to the 
strength of the interactions for which it generally selects. 



A, Useof Multiple Reporter Genes 

In particular embodiments, it may desirable to provide two or more reporter gene 
35 constructs, particularly reporter genes encoding products with different emission or 

excitation specta (Hawley et al., Biotechniques 30: 1028-1034 (2001)). The reporter genes 
can both encode direct FACS tags, indirect FACS tags, or a combination thereof One or 
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more of tlae rq)orter genes could also encode a polypeptide which can be used in the pr^- 
flow enrichment step described below. 

The simultaneous monitoring of two or more reporter genes (whether providec3 oa 
the same or separate plasmids) can be used for at least 2 purposes: 1) to reduce the member 
of false positives; and 2) to ensure specificity of interaction pairs. For example, when 
10 selecting DBDs, one migjit select a protein that recognizes sites in one reporter con:stract 
but that does not bind as well to sites in the other. 

There are currently available, from commercial sources, fluorescent proteins that 
have distinct emission spectra (e.g. DsRed (RPP), EYFP, EGFP, ECFP, EBFP). Using 
some of these fluorescent proteins and commercially available FACS equipmei^t it is 

15 possible, in principle, to simultaneously and independently measure up to five distinct 
fluorescent reporter genes. There are also commercially available fluorescent proteins 
which have similar emission spectra but distinct excitation spectra (e.g. EGFK> and GFPuv). 
Modifications to FACS eqmpment that enable the separate measurement of ^e 
fluorescence of a single cell when excited by different wavelengths (as desc^ribed in 

20 Anderson et. Al., PNAS 93 8508-851 1- 1996) coupled with the use of additional reporter 
genes with similar emission spectra and distinct excitiation spectra could Turther increase 
the number of FACS tags that could be independently measured. One possible caveat with 
using more than one of these proteins in a single cell is that the commercially available 
genes that encode some of the proteins have very similar DNA sequenc^es- having regions 

25 with very similar seqeunces in the same cell could have undesired eff(&.cts xipon the reporter 
constructs (due, for example, to recombination). This problem can b^ easily overcome 
because the genetic code is redundant- mutations can be made to the offending DNA 
sequences that do not change the amino acid sequence in the expressed protein 

In certain embodiments in which the subject flow-TTS is be^ng used to identify a 
30 DNA binding domain (as described in further detail below), maltLple rq)orter gene 

constructs can be used in order to faciliate isolation of domains with specific DNA binding 
activity. For example, the ITS host cell can include one or more reporter genes having 
transcriptional regulatory sequences for which a DNA binding Qomahi is sought At the 
same time, the cells can also include one or more reporter gene;s, encoding different FACS 
35 markers than above, under the control of transcriptional regulatory sequences which the 
DBD being sought should not bind to or activate expression -jfrom. Thus, cells encoding 
and expressing desired candidates can be isolated on the basis of differential expression of 
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5 the reporter genes. This could be used to obtain proteins with desired site specificities or 
desired binding constants. 

In certain embodiments it may be desirable to monitor the interactions of a DBD 
with a number of DNA sites greater than the number of independently measurably FACS 
tags in a given system. This could be accomplished by having multiple reporter constructs 

10 (on the same or different vectors) in which some of the DNA sites control the expression of 
separate copies of the same FACS tag- this would obviously make it impossible to 
independently measure all of the interactions between the DBD and each of the sites, but in 
some cases it is not necessary to independently monitor all of the interactions. For 
example, the desired DNA binding site could be operably linked to EGFP while a number a 

15 point mutants of the DNA binding site could each be operable linked to a copy of RFP. In 
this way, DBD's that interact with the target sequence, but that interact with NONE of the 
mutants of the target sequence, could be obtained by selecting for cells that express a very 
high amount of EGFP AND a very low amount of RFP 

To further illustrate, Figure 7 shows an exemplary constmct containing two 
20 different DNA sites (Tl 1 binding site and Zi£268 bmding site) to which DBD*s that bind 
differentially to these sites are desired (i.e. DBD's which bind to the Tl 1 site and not the 
Zif268 site, or vice-versa). Increased expression of EGFP, caused by tihe bait protehi 
binding to the Zif268 site, provides a FACS sortable signal. Increased expression of REP, 
caused by the bait protein binding to the Tl 1 site, also provides a FACS sortable signal. 
25 Expression of EGFP and RFP can be independently detected either sequentially (in separate 
selection steps) or simultaneously (in the same selection step). Thus, in a simultaneous 
mode, the FACS machine can be progranmaed to gate on the detection of EGFP and RFP, 
selecting only those cells which are positive for EGFP and negative for RFP (or vice-versa). 
The use of multiple selection criteria also could be implemented by combining growth-rate 
30 selections or affinity-based cell sorting (using one set of reporters) with FACS-based 
sorting (using another set of reporters). 



B. Fluorescence Activated Cell Sortins of ITS cells 

Fluorescence activated cell sorting techniques and equipment are well known in the 
35 art and are readily adapted for use in the subject assay. In recent years, optical/electronic 
instrumentation for detecting fluorescent labels on or in cells has become more 
sophisticated. For example, flow cytometry can be used to measure the amount of 



wo 01/88197 



PCT/USOl/15718 



-85- 



5 fluorescent label on individual cells at a rate exceeding 100,000 cells per second and isolate 
desired cells to high purity at a rate exceeding 70,000 cells per second. These instruments 
can excite fluorescence at many wavelengtihs of the UV, visible, and near IR regions of the 
spectrum. 

Id general, the flow cytometer for use in the present invention is constructed in such 
10 a way that ITS cells in suspension are introduced one at a time into an interrogation 
volume. Within this volume the cells are illuminated, generally by a laser, to excite the 
fluorescence tag associated with the cells. The fluorescence is then separated on the basis 
of its color, through the use of optical filters, and then detected and quantified by the 
electronics. The signals measined by each of these detectors, representing fluorescence at 
15 different wavelengths, are often referred to in the art as "fluorescence channels". 

If only one fluorescence channel is being monitored, the results of this mterrogation 
can be displayed in the form of histograms which represent the distributions of cells in the 
population examined. If two or more fluorescence channels are being monitored 
simultaneously, the results of this interrogation can be displayed in the form of one or more 

20 two-dimensional dot plots where each dot in the plot represents a single cell and the dot is 
drawn in the two-dimensional space so &at the dofs position with respect to the x axis 
indicates the intensitiy of the cell's signal in the first fluorescence channel and the dot's 
position with respect to the y axis indicates the intensity of the cell's signal in the second 
fluorescence channel. Many tens of thousands of cells may be interrogated per second 

25 resulting in a very rapid description of the cell population. 

The ITS cells are selectively isolated or sorted to high purity as they pass through 
this system on the basis of their fluorescence profile. If the cells are being sorted on the 
basis of a single fluorescence channel, a lower limit and an upper limit are drawn on the 
histogram for that fluorescence channel and all cells having a signal which falls between the 

30 lower and upper limits are isolated. If the cells are being sorted on the basis of two 

fluorescence channels, a polygon is drawn on the two dimensional dot plot for those two 
fluorescence channels and all cells that have signals that fall within the polygon are 
isolated. If cells are being sorted on the basis of three or more fluorescence channels, 
polygons are drawn on each of the relevant dot plots and cells falling within all of the 

35 relevant polygons are isolated. FACS equipment is also usually equipped to measure two 
non-fluorescent channels (i.e. channels at the same wavelmgth as the excitation 
wavelength) which are referred to in the art as "forward scatter" and "side scatter". These 
parameters are often used in the sorting criteria much as the fluorescence channels are used. 
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5 Jn the case where the desired cells are rare in the population (less than 1 in 10"*) it is 

often necessary to perform multiple rounds of sorting to achieve a high purity of positive 
cells. Genetically identical cells have a distribution of fluorescent signals and at a certain 
frequency some cells which dont contain an ITS interaction will have a signal consistent 
with that of a positive cell (i,e. a cell containing an ITS interaction) by mere chance. As 
10 described in Daugerty, PS et aL, Protein Engineering 1 1 , p825-832 (1998), you can isolate a 
population of cells from your initial library that have fluorescence signals consistent with 
the desired cells, amplify this new population, and use this resultmg amplified population in 
subsequent rounds of sorting. This process is repeated until the population has attained the 
desired purity of positive cells. 

15 If the growth conditions can be varied so that cells containing an ITS interaction no 

longer have an elevated fluorescent signal, it is possible to perform multiple rounds of 
sorting under different conditions to retain cells that contain an ITS interaction while 
discarding cells which have an elevated fluorescent signal due to spurious genetic 
mutations. As described in Valdivia, RH and Falkow, S, Science 277, p2007-2011 (1997), 

20 you can first isolate a population of cells containing an elevated fluorescent signal under 
conditions in which the cells you desire will give you an elevated fluorescent signal. You 
then place this new population of cells under conditions where the cells you desire will no 
longer have an elevated fluorescent signal and isolate the cells from this new population 
that no longer have an elevated fluorescent signal thus discarding cells that had an elevated 

25 fluorescent signal for spurious reasons. 

The level of fluorescence resulting from various levels of expression of the reporter 
gene can be compared to the level of fluorescence resulting Scorn background expression of 
the reporter gene in a substantially identical cell that lacks heterologous DNA, such as the 
gene encoding the prey fusion proteui. Any statistically or otherwise significant difference 
30 in the amount of transcription indicates that the prey fusion protein interacts with the bait 
fusion protein. Other controls mclude mutant bait proteins (in protein-protein interaction 
formats) and the use of DBD elements that disrupt interaction, to name but a few. 

Another consideration which the practitioner of the subject assay must bear in mind 
is that bacteria, marine plankton and plant cells frequently exhibit a strong natural 
35 autofluorescence from chlorophyll or other pigments e,g- phycobiliproteins. Thus, 
practicing the subject flow-ITS requires that the autofluorescence of the host cell be 
accounted for as background, particularly where the FACS tag is detected at wavelengths 
above 60Qnm. 
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5 C Pre-flow Enrichment Affinity Purification or Growth Rate Selection 

In certain embodiments of the subject assay, the ITS cells are subjected to a pre- 
flow enrichment step in which the ITS cells are first subjected to an afjSnity separation step 
before being subjected to FACS separation. By this step, high throughput separation of 
large initial populations of ITS cells can be carried out, e.g., initial ITS cell populations 
10 exceeding 1 0^^-10^ 5 cells per day using conventional columns. 

In this step, ITS cells that express a particular cell surface protein are identified and 
isolated in an affinity separation step. To accomplish this, the ITS cells include a reports 
gene which encodes a surface FACS tag protein. Upon development of the interaction trap, 
the ITS cells are applied to an immobilized matrix which includes a moiety that interacts 

15 with the surface FACS tag protein. In this manner, ITS cells expressing the surface FACS 
tag can be sequestered on the matrix and thereby separated fi'om ITS cells which do not 
express at least a certain threshold level of the surface FACS tag. The surface FACS tag 
can be a cell surface protein which also serves as an indirect FACS tag for the FACS step. 
Alternatively, the surface FACS tag can be a product of a second reporter gene, e.g., the 

20 cells includes at least two reporter genes, one which provides a surface FACS tag for 
affinity enrichment and one which provides a direct or indirect FACS tag. 

The immobilized matrix can include an antibody or other binding moiety which 
specifically binds to flie smf ace FACS tag of the ITS cell. Where the surface FACS tag is a 
receptor, or at least ligand binding domam ttiereof, the immobilized matrix can include a 
25 ligand of the receptor. Such Kgands can be polypeptides or small molecules. The portion 
of the matrix which binds to the surface FACS tag on the ITS cells is, for ease, referred to 
collectively herein as the '^binding agenf*. 

Wifli respect to affinity chromatography, it will be generally understood by those 
skilled m the art that a great number of chromatography techniques can be ad^ted for use 

30 in the present invention, ranging fi-om column chromatography to batch elution. Typically, 
the binding agent is immobilized (reversibly or irreversibly) on an insoluble carrier, such as 
sepharose or polyacrylamide beads. The population of ITS cells is applied to the affinity 
matrix under conditions compatible with the binding of the surface FACS tag to binding 
agent. The population is then fractionated by washing with a solute that does not greatly 

35 effect specific binding of surface FACS tag and binding agent, but which substantially 

disrupts any non-specific binding of the ITS cells to the matrix. A certain degree of control 
can be exerted over the binding characteristics of the ITS cells recovered from the cell 
culture by adjusting the conditions of the binding mcubation and subsequent washing. The 
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5 temperature^ pH, ionic strength, divalent cation concentration, and the volume and duration 
of the washing can select for ITS cells within a particular range of expression of the surface 
FACS tag. 

After "washing" to remove non-specifically boimd ITS cells, when desired, 
specifically bound ITS cells can be eluted by either specific desoiption (using excess 

10 surface FACS tag) or non-specific desorption (using pH, polarity reducing agents, or 
chaotropic agents). In preferred embodiments, the elution protocol does not kill the 
organism used as the ITS cell such that the enriched population of ITS cells can be further 
amphfied by reproduction. The list of potential eluants includes salts (such as those in 
which one of the counter ions is Na+, Rb*^, SO42-, H2P04^ citrate, K"**, Li"^, Cs"^, 

15 HSO4-, CO32-, Ca2+ Sr2+ CI", PO42-, HCO3-, Mg2+, Ba2"^, Br, HP042% or acetate), 

acid, heat, and, when available, soluble forms of the target antigen (or analogs thereof). 
Neutral solutes, such as ethanol, acetone, ether, or urea, are examples of odier agents useful 
for eluting the bound ITS cells. 

In preferred embodiments, affinity enriched ITS cells can be iteratively amplified 
20 and subjected to further rounds of affinity separation until enrichment of the desired 
binding activity is detected. In certain embodiments, the specifically bound IT$ cells, 
especially bacterial cells, need not be eluted per se, but rather, the matrix bound ITS cells 
can be used directly to inoculate a suitable growth media for amplification. Cells obtained 
with this protocol may — ^if desired— be used for subsequent flow selection studies using 
25 one or more reporter constructs. 

In an another embodiment, high-gradient magnetic cell separation (MACS) 
techniques can be used to fi-actionate the ITS cell population. The MACS System (Miltenyi 
Biotech, Inc., Sunnyvale, CA) utilizes a powerful magnet designed to extract cells that are 
specifically coated with ferrous-microbeads (50 nm in diameter) that are coupled to 

30 secondary antibodies, streptavidin or biotin. For instance, if a biotinylated primary 

antibody dhected against a reporter surface FACS tag protein is used, the addition of the 
streptavidin beads will bind the subset of cells expressing the surface FACS tag. The ITS 
cells can be contacted, e.g., in batch, with the microbeads. The microbead coated cells can 
then be passed through a column surrounded by a large magnet. The coated cells are 

35 retained and the oflier cell types pass through the column. The column may be, optionally, 
subjected to a wash step. The bound cells are released when the magnet is removed and 
collected. This cell separation system can be used to enrich for or deplete a subpopulation 
of cells within the mixture. To fiirther illustrate, a biotinylated antibody directed against the 
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5 surface FACS tag can be incubated with the ITS cells for a period of time sufficient for, 
e.g., antibody binding to the surface FACS tag to reach equilibrium. The antibody/cell 
complexes can then be captured on an immobilized matrix derivatized with streptavidin, 
such as the MACS streptavidin-conjugated super-paramagnetic microbeads (Miltenyi 
Biotec). A mixture of cells labeled with biotin-conjugated antibodies (e.g., against the 

10 surface FACS tag) is passed through the streptavidin column which is surrounded by a 
powerful rare earth magnet such as a MACS separator (Miltenyi Biotech). The ITS cells 
which express the surface FACS tag will be differentially retained on the column relative to 
cells which do not express the surface FACS tag. By removing the column from the 
magentic field, the labeled ITS cells can be eluted from the column, e.g., as the "magnetic 

15 fraction". See, for example, DiNicola et al. (1996) Bone Marrow Transplant 18:1117. 

In general, the affinity enrichmmt step will sacrifice some specificity for higher 
throughput. Conventional columns are typically capable of retaining about 10^ cells. 
However, the specificity of most such columns will typically be in the range of about 50 
precent. This means that about 5 x 10^ cells with the desired phenotype will be retained on 
20 the column. If one assumes that a particular "interaction event" in a cDNA library is occurs 
infrequently (about 1 in 10^), then one should be able to pass 5 x lO^^ cells through a single 
colimm. Assuming an average flow rate of about 5x10^^ cells per minute, it would take 
just under 17 hours to pass through 5 x 10^^ cells through one column. 



25 D. General Applicability of FloW'ITS approaches 

We note that all of the Flow-ITS strategies described in this application are also 
applicable to not only prokaryotic cells but also yeast, mammalian, and other eukaryotic 
cells as well. 



30 VI. Exemplary Methods for Generating Libraries 

The variegated libraries of the subject method, be their diversity at the level of a 
coding sequence for a portion of one or both of the bait and prey proteins or the DBD 
recognition sequence of a reporter gene, can be from obtained from naturally occurring 
sources or the product of random or semi-random mutagenesis or synthesis with random or 
35 semi-random segments. 
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5 For instance, coding sequences can be members of a DNA expression library (e.g., a 

cDNA or synthetic DNA library, either random or intentionally biased) that are fused in- 
frame to to generate a variegated library of bait or prey proteins. 

In an exemplary embodiment, cDNAs may be constructed from any raKNA 
population and inserted into an equivalent expression vector. Such a library of choice may 

10 be constructed de novo using commercially available kits (e.g., from Stratagene, La Jolla, 
CA) or using well established preparative procedures (see, for example, Current Protocols 
in Molecular Biology, Eds. Ausubel et al. John Wiley & Sons: 1992). Alternatively, a 
number of cDNA Ubraries (from a number of different organisms) are pubUcly and 
commercially available; sources of libraries include, e.g., Clontech (Palo Alto, CA) and 

15 Stratagene (La Jolla, CA). It is also noted that prey polypeptide need not be naturally 
occurring full-length proteins. In preferred embodiments, prey proteins are encoded by 
synthetic DNA sequences, are the products of randomly generated open reading frames, are 
open reading frames synthesized with an intentional sequence bias, or are portions thereof 

It will be appreciated by those skilled in the art that many variations of the prey and 
20 bait fusion proteins can be constmcted and should be considered within the scope of the 
present invention. For example, it will be understood that, for screening polypeptide 
libraries, the identity of the prey polypeptide can be fixed and the bait protein can be varied 
to generate the library. Indeed, in certain embodiments it will be desirable to derive the 
prey fusion protein with a jfixed prey polypeptide rather than a variegated library on the 
25 grounds that the single prey fusion protein can be easily tested for its ability to be 

assembled into a functional RNA polymerase enzyme. Moreover, where the prey fiision 
protein is derived with a polymerase interaction domain, the bait fusion protein is likely to 
be less sensitive to variations caused by the different peptides of the library than is the prey 
fusion protein. In such embodiments, a variegated bait polypeptide library can be used to 
30 create a library of bait fusion proteins to be tested for interaction with a particular prey 
protein. 

There are many ways by which libraries of mutagenized can be gmerated from a 
degenerate oligonucleotide sequence. Chemical synthesis of a degenerate gene sequence 
can be carried out in an automatic DNA synthesizer, and the synthetic genes then ligated 
35 into an appropriate expression vector. The purpose of a degenerate set of genes is to 
provide, in one mixture, all of the sequences encoding the desired set of potential 
sequences. The synthesis of degenerate oUgonucleotides is well knoAvn in the art (see for 
example, Narang, SA (1983) Tetrahedron 39:3; Itakura et al. (1981) Recombinant DNA, 
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5 Proc 3rd Cleveland Synnpos. Macromolecides^ ed. AG Walton, Amsterdam: Elsevier 
pp273-2g9; Itakura et al. (1984) Annu, Rev, Biochem. 53:323; Itakura et al. (1984) Science 
198:1056; Dee et al. (1983) Nucleic Acid Res. 1 1:477. Such techniques have been employed 
in the directed evolution of other proteins (see, for example, Scott et al. (1990) Science 
249:386-390; Roberts et al (1992) PNAS 89:2429-2433; Devlin et al. (1990) Science 249: 

10 404-406; Cwirla et al. (1990) PNAS 87: 6378-6382; as well as U.S. Patents Nos. 5,223,409, 
5,198,346, and 5,096,815). 

Alternatives to the above combinatorial mutagenesis also exist. For example, 
libraries of potential DNA binding domains can be generated using, for example, alanine 
scaiming mutagenesis and the like (Ruf et al. (1994) Biochemistry 33:1565-1572; Wang et 

15 al. (1994) J. Biol Chent. 269:3095-3099; Balint et al. (1993) Gene 137:109-118; Grodberg 
et al. (1993) Eur. J. Biochem. 218:597-601; Nagashima et al. (1993) Biol. Chem. 
268:2888-2892; Lowman et al. (199 f) Biochemistry 30:10832-10838; and Cunningham et 
al. (1989) Science 244:1081-1085), by linker scanning mutagenesis (Gustin et al. (1993) 
Virology 193:653-660; Brown et al. (1992) M>/. Cell Biol. 12:2644-2652; McKnight et al. 

20 (1982) Science 232:316); by saturation mutagenesis (Meyers et al. (1986) Science 232:613); 
by PGR mutagenesis (Leung et al. (1 989) Method Cell Mol Biol 1 : 1 1 - 1 9); by in vitro DNA 
shuffling (Stemmer ref.); or by random mutagenesis (Miller et al. (1992) A Short Course in 
Bacterial Genetics, CSHL Press, Cold Spring Harbor, NY; and Greener et al. (1994) 
Strategies in Mol Biol 7:32-34). 

25 

A, Directed Evolution Approaches 

Moreover, in a method of directed evolution, identified interacting pahs can be 
improved by additional rounds of mutagenesis, selection, and an^iplification, e.g., 
diversity can be introduced into one or both of the identified interacting pair, and the 
30 resulting Ubrary screened according to the present invention. The goal may be, for 
instance, to use such a process to optimize the binding characteristics, e.g., for tighter 
binders and/or better selectivity in binding. Diversity can be introduced by most any 
standard mutagenesis technique, such as by irradiation, chemical treatment, low fidelity 
replication, use of randomized PGR primters, etc. 

35 The flow-ITS embodiment of the subject assay is particularly well suited for 

durected evolution applications. For instance, the easy with which small samples can be 
obtained at intermediate points permits the practionier to assess the progess of, for 
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5 example, a randomization step or counter-selection step. The ability to tune the 

fluorescence cutoff values for gating cells and to use i*eporters with different sites also 
permits the user to readily adjust the stringency of the isolation step from one round of 
direct evolution to the next. 

10 Vn. Exemplary ITS Embodiments for Detecting DNA-Protein Interactions 

Jn certain preferred embodiments, various of the embodiments of the subject method 
can be used to identify or optimize DNA-protein interactions. For example, the subject 
method can be xised to identify mutant or composite DNA binding domams having desired 
sequence binding preferences. It can also be used to identify DNA sequences which are 
15 selectively bound by a given DNA binding protein and/or to determine the sequence 
specificity of a DNA binding protein. 

DNA-binding proteins, such as transcription Actors, are critical regulators of gene 
expression. For example, transcriptional regulatory proteins are known to play a key role in 
cellular signal transduction pathways which convert extracellular signals into altered gene 

20 expression (Curran and Franza, (1988) Cell 55:395-397). DNA-binding proteins also play 
critical roles in the control of cell growth and in the expression of viral and bacterial genes. 
A large number of biological and clinical protocols, including among others, gene therapy, 
production of biological materials, and biological research, depend on the ability to elicit 
specific and higih-level expression of genes encoding RNAs or proteins of therapeutic, 

25 commercial, or experimental value. Such gene expression is dependent on protein-DNA 
interactions. 

Attempts have been made to change the specificity of DNA-binding proteins. Those 
attempts rely primarily on strategies involving mutagenesis of tiiese proteins at sites 
expected to be important for DNA-recognition and often have been selected via phage 

30 display (see, for example, Rebar and Pabo, (1994) Science 263:671-673; Jamieson et al. 
(1994) Biochemistry 33:5689-5695; Suckow et al. (1994) Nuc Acids Res 22:2198-2208; 
Greisman and Pabo, (1997) Science 275:657-661). This strategy may not always be 
efficient or possible with some DNA-binding domains because of limitations imposed by 
their three-dimensional structure, mode of docking to DNA, or special requirements of 

35 phage display. In other cases it may not be sufficient to achieve important objectives 
discussed below. Therefore, it is desirable to have a strategy which can utilize many 
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5 different DNA-binding domains and can combine tiiem as required for DNA recognition 
and gene regulation. 

In certain embodiments, the subject methods can be used to alter the DNA binding 
specificity of a known DNA binding protein. In other embodiments, the subject method 
can be used to generate novel composite DNA binding domains by combinatorially 

10 combining various motifs. The appended examples illustrate this aspect of the invention. 
The most widely used domain within protein transcription factors appears to be the zinc 
finger (Zf) motif This is an independently folded zinc-containing mini-domain which can 
be used in a modular fashion to achieve sequence-specific recognition of DNA (see, for 
example, Klug (1993) Gene 135:83-92; Rebar and Pabo (1994), supra : Jamieson et al. 

15 (1994) Biochemistrv 33:5689-5695: Choo et al. (1994) PNAS 91:1 1163-1 1167; Wu et al. 
(1995) PNAS 92: 344-348; Segal et ah (1999) PNAS 96: 2758-2763; Greisman and Pabo 
(1997), supra) . Variants zinc fingers with new DNA binding specificities have been 
selected Srom large randomized libraries using phage display . Herein we show that our 
system can be used to isolate zinc finger variants firom a large random library usmg our 

20 bacterial-based ITS. 

In still other embodiments, the regulatory sequence can be provided in a 
combinatorial format, e.g., to provide a library of potential target DNA sequences. Those 
sequences which are bound by a DNA binding domain can be identified in the library. 

For example, the method can be used to identify DNA-protein interactions by the 
25 steps of providing a host cell which contains a target gene encoding a growth selective 
marker or other selectable marker, operably linked to a target DNA sequence. The cell is 
also engineered to include a first chimeric gene which encodes a first fusion protein 
including (a) a first interacting domain, and (b) a test DNA binding domain. This also 
includes a second chimeric gene encoding a second fiision protein uicluding (a) a second 
30 interacting domain that binds to the first interacting domain, and (b) an activation tag (such 
as a polymerase interaction domain) which activates transcription of the selective marker 
gene when localized in the vicinity of the target DNA sequence. One or both of the test 
DNA binding domains and/or the target DNA sequence are provided in the host cell 
populations as variegated libraries (with respect to sequence) to yield a library cOmiplexity 
35 of at least 10^ members. Cells in which interaction of a test DNA binding domain and a 
target DNA sequence occur can be selected and/or amplified based on the resulting growth 
trait conferred by the growth selective marker or based on cell sorting methods. 
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5 As described above, the ITS is set up with a bait fusion protein having a first 

interacting domain and a known or potential (test) DNA binding domain (DBD), e.g., a 
polypeptide which may specifically bind to a defined nucleotide sequence. 

In embodiments wherein the target DNA sequence is being varied, the DBD portion 
of the bait fiision protein can be derived using all, or a DNA binding portion, of a 
10 transcriptional regulatory protein, e.g., of either a transcriptional activator or transcriptional 
repressor, which retains the ability to selectively bind to particular nucleotide sequences. 

In embodiments wherein the system is derived with a variegated Ubrary of DNA 
binding domains, the DBDs can be, for example: a collection of naturally occurring DNA 
binding domains; a collection of mutagenized DN A bmding domams, i.e., altered by point 
15 mutation, deletion or addition or randomized synthesis of relevant segments; or a collection 
of composite DNA binding domains derived from combinatorial assembly of various DNA 
binding elements or a randomized polypeptide sequence attached to other DNA binding 
modules. 

The interacting domain can be any polypeptide sequence for which there is a known 
20 protein binding partner. It may be, for example, a dimerization or other oligomerization 
motif Such a domain can be a constitutive oUgomerization domain, or an inducible 
oUgomerization domain, i.e., a domain mediating oligomerization only in the presence of a 
third molecule, such as a small organic molecule. Examples of constitutive oligomerization 
domains include leucine zippers. 

25 Example of inducible oUgomerization domains include FK506 and cyclosporin 

binding domains of FK506 binding proteins and cyclophilins, and the rapamycin binding 
domain of FRAP. Such inducible oligomerization domains are referred to herein as "ligand 
binding domains'* and are fiirther described herein under the section entitled accordingly. 

A dimerization domain is defined herein as a sequence of amino acids enable of 
30 forming homodimers or heterodimers. One example of a dimerization domain is the leucine 
zipper QJZ) element. Leucine zippers have been identified, generally, as stretches of about 
35 amino acids containing 4-5 leucine residues separated firom each other by six amino 
acids (Maniatis and Abel (1989) Nature 341 :24-25). Exemplary leucine zippers occur in a 
variety of eukaryotic DNA binding proteins, such as GCN4, C/EBP, c-Fos, c-Jun, c-Myc 
35 and c-Max. Other dimerization domains include helix-loop-helix domains (Murre, C. et al. 
(1989) Cell 58:537-544). Dimerization domains may also be selected fi-om other proteins, 
such as the retinoic acid receptor, the thyroid hormone receptor or other nuclear hormone 
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5 receptors (Kurokawa et al (1993) Genes Dev. 7:1423-1435) or from the yeast transcription 
factors GAL4 and HAPl (Marmonstein et al. (1992) Nature 356:408-414; Zhang et al. 
(1993) Proc. Natl. Acad. Sci. USA 90:2851-2855). Dimerization domains are further 
described in U.S. Pat. No. 5,624,818 by Eisenman. 

In another embodiment, the oligomerization domain is a tetramerization domain. 

10 For example, the tetramerization domain is the E. coh lactose repressor tetramerization 

domain (amino acids 46-360; Chakerian et al. (1991) J. Biol. Chem. 266:1371; Alberti et al. 
(1993) EMBO L 12:3227; and Lewis et al. (1996) Nature 271 :1247). Thus, the inclusion of 
a tetramerization domain in a transcriptional activator allows four activation domains to be 
complexed together and form a transcriptional activator complex. Furthermore, more than 

1 5 one activation unit can be linked to one tetramerization domain, to thereby form a 
transcriptional activator complex comprisuig more than 4 activation units. 

In another embodiment, the tetramerization domain is that from a p53 protein. The 
p53 tetramerization domain maps to residues 322-355 of p53 (Wang et al. (1994) Mol. Cell. 
Biol. 14:5182; Clore et al. (1994) Science 265:386) add is fiirther described in U.S. Pat. No. 
20 5,573,925 by Halazonetis. 

Other exemplary suitable tetrammzation domains include artificial tetramerization 
domains, such as variants of the GCN4 leucine zipper that form tetramers (Alberti et al. 

(1993) EMBO J. 12:3227-3236; Harbury et al, (1993) Science 262:1401-1407; Krylov et al, 

(1994) EMBO J. 13:2849-2861). One of skUl in the art could readUy select alternate 
25 tetramerization domains. For example, the tetrameric variant of GCN4 leucme zipper 

described in Harbury et al. (1993), si^ra, has isoleucines at positions d of the coiled coil 
and leucines at positions a, in contrast to the original zipper which has leucines and valines, 
respectively. 

In addition, the art also provides a variety of techniques for identifying other 
30 naturally occurring oligomerization domains, as well as oligomerization domains derived 
from mutant or otherwise artificial sequences. See, for example, Zeng et al. (1997) Gene 
185:245; O'Shea et al. (1992) Cell 68:699-708; Krylov et al. [cited above]. 

In another embodiment, libraries of potential DNA binding domains are created 
from the assembly of DNA binding motifs from various transcription factors, e.g., resulting 
35 in DNA binding domams which may have novel DNA binding specificities. Such DNA 
binding domains, referred to herein as "composite DNA binding domains" can be designed 
to specifically recognize unique binding sites. For example, a DNA binding domain can be 
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5 constructed that comprises DNA binding regions firom a zinc finger protein and a 
homeobox protein. 

The DNA sequences recognized by a chimeric protein containing a composite 
DNA-binding domain can be determined using the subject method, e.g., by library vs. 
library screening, or the proteins can be selected by their specificity toward a desired 

10 sequence. A desirable nucleic acid recognition sequence consists of a nucleotide sequence 
spanning at least ten, preferably eleven, and more preferably twelve or more bases. The 
component binding portions (putative or demonstrated) within the nucleotide sequence need 
not be fully contiguoxis; they may be interspersed with "spacer** base pairs that need not be 
directly contacted by the chimeric protein but rather impose proper spacing between the 

1 5 nucleic acid subsites recognized by each module. These sequences shoxild not impart 
expression to linked genes when introduced into cells in the absence of the engineered 
DNA-binding protein. 

In preferred embodiments, the subject method can be used to identify a nucleotide 
sequence that is recognized by a transcriptional activator protein containing a composite 

20 DNA-binding region, preferably recognized with high affinity and specificity, several 
methods can be used. For instance, high-affinity bmding sites for the protein or protein 
complex can be selected firom a large pool of random DNA sequences, and their sequences 
determined. From this collection of sequences, individual sequences with desirable 
characteristics (i.e., high affinity and specificity for composite protein, minimal affinity for 

25 individual subdomains) are selected for use. Alternatively, the collection of sequences is 
used to derive a consensus sequence that carries the favored base pairs at each position. 
Such a consensus sequence is synthesized and tested (see below) to confirm that it has an 
appropriate level of affinity and specificity. 

An alternative approach to generating novel sequence specificities is to use 
30 databases of known homologs of the DBD to predict amino acid substitutions that will alter 
binding. For example, analysis of databases of zinc finger sequences has been used to alter 
the binding specificity of a zinc finger (Desjarlais and Berg (1993) Proc Natl Acad. ScL 
USA 90, 2256-2260). 

A fiirth^ and powerful approach is random mutaganesis of amino acid residues 
35 which may contact the DNA, followed by screening or selection for the desired novel 
specificity. For example, phage display of the three fingers of Zif268 (including the two 
incorporated into ZFHDl) has been described, and random mutagenesis and selection has 
been used to alter the specificity and affinity of the fingers (Rebar and Pabo (1994) Scie?tce 
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5 263, 671-673; Jainieson et al, (1994) Biochemistry 33, 5689-5695; Choo and King 

(1994) Proc. Natl Acad, Set USA 91, 11163-11167; Choo andKlug (l994)Proc. Natl 
Acad. Set USA 91, 1 1168-1 1 172; Choo et al (1994) Nature 372, 642-645; Wu et al 

(1995) Proc. Natl. Acad. Sci USA 92, 344-348). These mutants can be incorporated into 
ZFHDl to provide new composite DNA binding regions with novel nucleotide sequence 

10 specificities. Other DBDs may be similarly altered. If structural information is not 

available, general mutagenesis strategies can be used to scan the entire domain for desirable 
mutations: for example alanine-scaiming mutagenesis (Cuimingham and Wells (1989) 
Science 244, 1081-1085), PGR mismcorporation mutagenesis (see eg. Cadwell and Joyce 
(1992) PGR Meth. Applic. 2, 28-33), and 'DNA shuffling' (Stemraerref ); or by random 

15 mutagenesis (Miller et al. (1992) A Short Course in Bacterial Genetics^ CSHL Press, Cold 
Spring Harbor, NY; and Greener et al. (1994) Strategies in Mol Biol 7:32-34). These 
techniques produce libraries of random mutants, or sets of single mutants, that can then be 
readily searched by screening or selection approaches such as phage display. 

In all these approaches, mutagenesis can be carried out directly on the DNA binding 
20 region, or on the individual subdomain of interest in its natural or other protein context In 
the latter case, the engineered component domain with new nucleotide sequence specijBcity 
may be subsequently incorporated into the composite DNA binding region in place of the 
starting component. The new DNA binding specificity may be wholly or partially dijBFerent 
jQrom that of the initial protein: for example, if the desired binding specificity contains (a) 
25 subsite(s) for known DNA binding subdomains, other subdomains can be mutated to 
recognize adjacent sequences and then combined with the natural domain to yield a 
composite DNA binding region with the desired specificity. 

Randomization and selection strategies may be used to incorporate other desirable 
properties into the composite DNA binding regions in addition to altered nucleotide 
30 recognition specificity, by imposiiig an appropriate in vitro selective pressure (for review 
see Clackson and Wells (1994) Trends Biotech. 12, 173-184). These include improved 
affinity, specificity, improved stability and improved resistance to proteolytic degradation. 

As appropriate, the DNA binding motif used to generate the bait fusion protein can 
include oligomerization motifs. As known in the art, certain transcriptional regulators 
35 dimerize, with dimerization promoting cooperative binding of the two monomers to their 
cognate recognition elements. 

The use of recombinant DNA techniques to create a fusion gene, with the 
translational product being the desired bait fusion protem, is well known in the art 
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5 Essentially, the joining of various DNA fragments coding for different polypeptide 
sequences is performed in accordance with conventional techniques, employing blunt- 
ended or stagger-ended termini for ligation, restriction enzyme digestion to provide for 
appropriate termini, filling in of cohesive ends as appropriate, alkaline phosphatase 
treatment to avoid undesirable joining, and enzymatic hgation. Alternatively, the fusion 

10 gene can be synthesized by conventional techniques including automated DNA 

synthesizers. In another method, PCR ampUfication of gene fragments can be carried out 
using anchor primers which give rise to complementary overhangs between two 
consecutive gene fragments which can subsequently be annealed to generate a chimeric 
gene sequence (see, for example, Current Protocols in Molecular Biology^ Eds. Ausubel et 

15 al. John Wiley & Sons: 1992). 

It may be necessary in some instances to introduce an unstructured polypeptide 
linker region between the DNA binding domain of the fusion protein and the bait 
polypeptide sequence. Where the bait fusion protein also includes dimerization sequences, 
it may be preferable to situate the linker between the dimerization sequences and the bait 

20 polypeptide. The linker can facilitate enhanced flexibility of the fusion protein allowing the 
DBD to freely interact with a responsive element, and, if present, the dimerization 
sequences to make inter-protein contacts. The linker can also reduce steric hindrance 
between the two fragments, and allow appropriate interaction of the bait polypeptide 
portion with a prey polypeptide component of the interaction trap system. The linker can 

25 also facilitate the appropriate folding of each fragment to occur. The linker can be of 

natural origin, such as a sequence determined to exist in random coil between two domains 
of a protein. An exemplary linker sequence is the linker found between the C-terminal and 
N-terminal domains of the RNA polymerase a subunit. CMher examples of naturally 
occurring linkers include linkers found in the A.cl and LexA proteins. Alternatively, the 

30 linker can be of synthetic origin. For instance, the sequence (Gly4Ser)3 can be used as a 

synthetic unstructured linker. Linkers of this type are described in Huston et al. (1988) 
PNAS 85:4879; and U.S. Patent No. 5,091,513, both incorporated by reference herein. 

A Design of Composite DNA-^binding Regions. 

35 Each composite DNA-binding region consists of a continuous polypeptide region 

containing two or more component heterologous polypeptide portions which are 
individually capable of recognizing (/.e, binding to) specific nucleotide sequences. The 
individual component portions may be separated by a linker contprising one or more amino 
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5 acid residues intended to permit the simultaneous contact of each component polypeptide 
portion with the DNA target. The combined action of the composite DNA-binding region 
formed by the component DNA-binding modules may result in the addition of the free 
energy decrement of each set of interactions. The effect is to achieve a DNA-protem 
interaction of very high affinity and specificity. This goal is often best achieved by 

10 combining component polypeptide regions that bind DNA poorly on their own, that is with 
low affinity, insufficient for functional recognition of DNA imder typical conditions in a 
mammalian cell. Because the hybrid protein exhibits affinity for the composite site several 
orders of magnitude higher than the affinities of the individual sub-domains for their 
subsites, the protein preferentially (preferably exclusively) occupies the "composite" site 

15 which typically comprises a nucleotide sequence spanning the individual DNA sequence 
recognized by the individual component polypeptide portions of the composite DNA- 
binding region. 

Suitable component DNA-bmding polypeptides for incorporation into a composite 
region have one or more, preferably more, of the followmg properties. They bind DNA as 
20 monomers, although dimers can be acconomodated. They should have modest aflBnities for 
DNA, with dissociation constants preferably in the range of 10"^ to 10"^ M. They should 
optimally belong to a class of DNA-binding domains whose structure and interaction with 
DNA are well understood and therefore amenable to manipulation. For gene therapy 
applications, they are preferably derived from human proteios. 

25 

B. Examples of suitable component DNA-binding domains, 

DNA-binding domains with appropriate DNA binding properties may be selected 
Sx)m several different types of natural DNA-binding proteins. One class comprises proteins 
that normally bind DNA only in conjunction with auxiliary DNA-binding proteins, usually 
30 in a cooperative fashion, where both proteins contact DNA and each protein contacts the 
other. Examples of this class include the homeodomain proteins, many of which bind DNA 
with low ajffinity and poor specificity, but act with high levels of specificity in vivo due to 
interactions with partner DNA-binding proteins. 

The homeodomain is a highly conserved DNA-binding domain which has been 
35 found in hundreds of transcription factors (Scott et ai, Biochim. Biophys. Acta 989:25-48 
(1989) and Rosenfeld, Genes Dev. 5:897-907 (1991)). The regulatory function of a 
homeodomain protein derives from the specificity of its interactions with DNA and 
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5 presumably with components of the basic transcriptional machinery, such as RNA 
polymerase or accessory transcription factors (Laughon, Biochemistry 30(48): 1 1357 
(1991)). A typical homeodomain comprises an approximately 61 -amino acid residue 
polypeptide chain, folded into three alhpha helices which binds to DNA. 

A second class comprises proteins in which the DNA-binding domain is comprised 
10 of multiple reiterated modules that cooperate to achieve high-affinity binding of DNA, An 
example is the Cys2His2 class of zinc-finger proteins, which typically contain a tandem 
array of from two or three to dozens of zinc-finger modules. Each module contains an 
alpha-helix capable of contacting a three to five base-pair stretch of DNA. Typically, at 
least three zinc-fingers are required for high-aflSnity DNA binding. Therefore, one or two 
15 zinc-fingers constitute a low-affinity DNA-binding domain with suitable properties for use 
as a component in this invention. Examples of proteins of the C2H2 class include TFIDA^ 
Zif268, Gh, and SRE-ZBP. (These and other proteins and DNA sequences referred to 
herein are well known in the art Their sources and sequences are known.) 

The zinc finger motif, of the type first discovered in transcription factor IHA (Miller 
20 et aLy EMBO J. 4: 1 609 (1 985)), offers an attractive fi:amework for studies of transcription 
factors with novel DNA-binding specificities. The zinc finger is one of the most common 
eukaryotic DNA-binding motifs (Jacobs, EMBO /. 11:4507 (1992)), and this family of 
proteins can recognize a diverse set of DNA sequences (Pavletich and Pabo, Science 
261:1701 (1993)). Crystallographic studies of the Zif268-DNA complex and other zinc 
25 finger-DNA complexes show that residues at four positions within each finger make most 
of the base contacts (with occassional contacts firom two other positions), and there has 
been some discussion about rules that may explain zinc finger-DNA recognition (Desjarlais 
and Berg, PNAS §9:7345 (1992) andKlevit, Science 252:1367 (1991)). However, studies 
have also shown that zinc fingers can dock against DNA in a variety of ways (Pavletich and 
30 Pabo.(1993) and Fairall ei al. Nature 366:483 (1 993)). 

A third general class comprises proteins that themselves contain multiple 
independent DNA-binding domains. Often, any one of these domains is insufficient to 
mediate high-affinity DNA recognition, and cooperation with a covalently linked partner 
domain is required. Examples include the POU class, such as Oct-1, Oct-2 and Pit-1, which 
35 contain both a homeodomain and a POU-specific domain; HNFl and certain Pax proteins 
(examples: Pax-3, Pax-6), which contain both a homeodomain and a paired box/domain. 

From a structural perspective, DNA-binding proteins containing domains suitable 
for use as polypeptide components of a composite DNA-binding region may be classified 
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5 as DNA-biBding proteins with a helix-tum-helix structural design, including, but not 

lixnited to, MAT 1, MAT 2, MAT al, Antennapedia, Ultrabithorax, Engrailed, Paired, Fushi 
tarazu, HOX, Unc86, and the previoxisly noted Octl, Oct2 and Pit; zinc finger proteins, 
such as Zif268, S WIS, Kriippel and Hunchback; steroid receptors; DNA-binding proteins 
with the hehx-loop-helix structural design, such as Daughterless, Achaete-scute (T3), 

10 MyoD, E12 and E47; and other helical motifs like the leucine-zipper, which includes 

GCN4, C/EBP, c-Fos/c-Jun and JunB. The amino acid sequences of the component DNA- 
binding domains may be naturally-occurring or non-naturally-occurring (or modified). 

The choice of component DNA-binding domains may be influenced by a number of 
considerations, including the species, system and ultimately the cell type in which the 

15 optimized DBD is to be expressed; the feasibility of incorporation into a chimeric protein, 
as may be shown by modeling; and the desired application or utility. The choice of DNA- 
binding domains may also be influenced by the individual DNA sequence specijGicity of the 
domain and the ability of the domain to interact with other protems or to be influenced by a 
particular cellular regulatory pathway. Preferably, the distance between domain termini is 

20 relatively short to facilitate use of the shortest possible linker or no linker. The DNA- 
binding domains can be isolated from a naturally-occurring protein, or may be a synthetic 
molecule based in whole or in part on a naturally-occurring domain. 

An additional strategy for obtaining component DNA-binding domains using the 
subject method is to modify an existing DNA-binding domain to reduce its affinity for 

25 DNA into the appropriate range. For example, a homeodomain such as that derived from 
the human transcription factor Phoxl, may be modified by substitution of the glutamine 
residue at position 50 of the homeodomain. Substitutions at this position remove or change 
an important point of contact between the protein and one or two base pairs of the 6-bp 
DNA seiquence recognized by the protein. Thus, such substitutions reduce the free energy 

30 of binding and the affinity of the interaction with this sequence and may or may not 
simultaneously increase the aflSnity for other sequences. Such a reduction in affinity is 
suflBcient to effectively eliminate occupancy of the natural target site by this protein when 
produced at typical levels in mammalian cells. But it would allow this domain to contribute 
binding energy to and therefore cooperate with a second linked DNA-binding domain. 

35 Other domains that amenable to this type of manipulation include the paired box, the zinc- 
finger class represented by steroid hormone receptors, the myb domain, and the ets domain. 

C. Design of lin ker sequence for coval entlY link ed composite DBDs, 
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5 The continuous polypeptide span of the composite DNA-binding domain may 

contain the component polypeptide modules linked directly end-to-end or linked indirectly 
via an intervening amino acid or peptide linker. A linker moiety may be designed or 
selected empirically to permit the independent interaction of each component DNA-binding 
domain with DNA witihout steric interference. A linker may also be selected or designed so 
10 as to impose specific spacing and orientation on the DNA-binding domains. The linker 
amino acids may be derived from endogenous flanking peptide sequence of the component 
domains or may comprise one or more heterologous amino acids. Linkers may be designed 
by modeling or identified by experimental trial. 

The linker may be any amino acid sequence that results in linkage of the component 
15 domains such that they retain the ability to bind their respective nucleotide sequences. In 
some embodiments it is preferable that the design involve an arrangement of domains 
which requires the linker to span a relatively short distance, preferably less than about 10 A. 
However, in certain embodiments, depending upon the selected DNA-binding domains and 
the configuration, the linker may span a distance of up to about SO A. For instance, the 
20 ZFHDl protein contains a glycine-glycine-arginine-arginine linker which joins the 
carboxyl-terminal region of zinc finger 2 to the amino-terminal region of the Oct-1 
homeodomain. 

Within the linker, the amino acid sequence may be varied based on the preferred 
characteristics of the linker as determined en^irically or as revealed by modeling. For 

25 instance, in addition to a desired length, modeling studies may show that side groups of 
certain nucleotides or amino acids may interfere with binding of the protein. The primary 
criterion is that the linker join the DNA-binding domains in such a manner that they retain 
their ability to bind their respective DNA sequences, and thus a linker which interferes with 
this ability is undesirable. A desirable linker should also be able to constrain the relative 

30 three-dimensional positioning of the domains so that only certain composite sites are 
recognized by the chimeric protein. Other considerations in choosing the linker include 
flexibility of the linker, charge of the linker and selected binding domains, and presence of 
some amino acids of the linker in the naturally-occurring domams. The linker can also be 
designed such that residues in the linker contact DNA, thereby influencing binding affinity 

35 or specificity, or to interact with other proteins. For example, a linker may contain an amino 
acid sequence which can be recognized by a protease so that the activity of the chimeric 
protein could be regulated by cleavage. In some cases, particularly when it is necessary to 
span a longer distance between the two DNA-binding domains or when the domains must 
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5 be held in a particular configuration, the linker may optionally contain an additional folded 
domain. 

D. Additional domains. 

Additional domains may be included in the various chimeric proteins of this 
10 invention, e.g,y a nuclear localization sequence, a transcription regulatory domain, a hgand 
binding domain, a protein-binding domain, a domain capable of cleaving a nucleic acid, etc. 

For example, in some embodiments the chimeric proteins will contain a cellular 
targeting sequence which provides for the protein to be translocated to the nucleus. 
Typically a nuclear localization sequence has a plurahty of basic anuno acids, refeired to as 
15 a bipartite basic repeat (reviewed in Garcia-Bustos et al, Biochimica et Biophysica Acta 
(1991) 1071, 83-101). This sequence can appear in any portion of the molecule intemal or 
proximal to the N- or C-terminus and results in the chimeric protein being localized inside 
the nucleus. 

DNA sequences encoding individual DNA-binding sub-domains and linkers, if any, 
20 are joined such that they constitute a single open reading frame encoding a chimeric protein 
containing the composite DNA-binding region and capable of being translated into a single 
polypeptide harboring all component domains. This protein-encoding DNA sequence is 
then placed into a conventional plasmid vector that directs the expression of the protein in 
the appropriate ceU type. For testing of proteins and determination of binding specificity 
25 and affinity, it may be desirable to construct plasmids that direct the expression of the 

protein in bacteria or in reticulocyte-lysate systems. For use in the production of proteins in 
mammalian cells, the protein-encoding sequence is introduced into an expression vector 
that directs expression in these cells. Expression vectors suitable for such uses are well 
known in the art. Various sorts of such vectors are commercially available. 

30 The ability to engineer binding regions with novel DNA binding specificities 

permits composite DNA binding regions to be designed and produced to interact 
specifically with any desired nucleotide sequence. Thus a clinically interesting sequence 
may be chosen and a composite DNA binding region engineered to recognize it. For 
example, composite DNA binding region may be designed to bind chromosomal 

35 breakpoints and repress transcription of an otherwise activated oncogene (see Choo et al 
(1994) Nature 372, 642-645); to bind viral' DNA or RNA genomes and block or activate 
expression of key viral genes; or to specifically bind the common mutated versions of a 
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5 mutational hotq)ot sequence in an oncogene and repress transcription (such as the mutation 
of codon 21 of human ras), and analogously to bind mutated tumor supressor genes and 
activate their transcription. 

Additionally, in optimizing chimeric proteins of this invention it should be 
appreciated that immimogenicity of a polypeptide sequence is thought to require the 

10 binding of peptides by MHC proteins and the recognition of the presented peptides as 
foreign by endogenous T-cell receptors. It may be preferable, at least in gene therapy 
applications, to alter a given foreign peptide sequence to minimize the probability of its 
being presented in humans. For example, peptide binding to human MHC class I molecules 
has strict requirements for cerain residues at key 'anchor* positions in the bound peptide: 

15 eg. HLA-A2 requires leucine, methionine or isoleucine at position 2 and leucine or valine at 
the C-terminus (for review see Stem and Wiley (1994) Structure 2, 145-251). Thus in 
engineered proteins, this periodicity of these residues could be avoided. 

Vm. Host cells 

20 Host cells which may be used in accord with the various embodiments of the 

invention include prokaryotes and eukaryotes. 

Exemplary eukaryotic host cells include yeast and mammalian cells.' 

Exemplary prokaryotic host cells are gram-negative bacteria such as Escherichia 
coli, or gram-positive bacteria such as Bacillus subtilis. 

25 Recognized prokaryotic hosts include bacterial strains of Escherichia, Bacillus^ 

StreptomyceSy Pseudomonas, Salmonella, Serratia, Streptococcus, Lactobacillus, 
Enterococcus, Shigella, and the like. In preferred embodiments, the prokaryotic host is 
compatible with the replicon and control sequences in the expression plasmid. 

Preferred prokaryotic host cells for use in carrying out tiie present invention are 
30 strains of the bacteria Escherichia, although Bacillus and other genera are also usefid. 
Techniques for transforming these hosts and expressing foreign genes cloned in them are 
well known in the art (see e.g., Maniatis et al. and Sambrook et al., ibid.). Vectors used for 
expressing foreign genes in bacterial hosts will generally contain a selectable marker, such 
as a gene for antibiotic resistance, and a promoter which functions in the host cell. 
35 Appropriate promoters include trp (Nicholset al. (1 983) Meth. EnzvmoL 101 : 155-164), lac 
(Casadaban et al. (1980) J. Bacteriol. 143:971-980). and phage lambda promoter systems 
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5 (Queen (1983) J, MoL AppL G&xgL 2:1-10). Plasmids useful for transforming bacteria 
include pBR322 (Bolivar et al. (1977) Gene 2 :95-1 13), the pUC plasmids (Messing (1983) 
MetLEnzvmoL 101 :20-77), Vieira and Messing (1982) Gene.1 9:259-268), pCQV2 
(Queen, supra), pACYC plasmids (Chang et al. (1978) JBacteriol 134:1141), pRW 
plasmids (Lodge et al. (1992) FEMS Microbiol Lett 95:271), and derivatives thereof. 

10 The choice of appropriate host cell will also be influenced by the choice of detection 

signal. For instance, the choice of cell can be influenced by the desire to use a reporter 
construct which encodes a particular direct FACS tag or indirect FACS tag. The reporter 
gene may be a host cell gene that has been operably linked to a "bait-responsive" promoter. 
Alternatively, it may be a heterologous gene that has been so linked. Suitable genes and 

15 promoters are discussed above. Accordingly, it will be understood that to achieve selection 
or screening by FACS, the host cell must have an appropriate phenotype so that expression 
of the reporter provides a statistically significant difference in fluorescence relative to the 
host cell without the reporter gene product. 

20 ZX Exemplary Uses of the Interaction Trap Systems 

The interaction trap systems of the present invention can be used, inter alia, for 
identifying protein-protein interactions, e.g., for generating protein linkage maps, for 
identifying therapeutic targets, and/or for general cloning strategies. As described above, 
the ITS can be derived with a cDNA hbrary to produce a variegated array of bait or prey 

25 proteins which can be screened for interaction with, for example, a known protein 

expressed as the coiresponding fusion protein in the ITS. In other embodiments, both the 
bait and prey proteins can be derived to each provide variegated libraries of polypeptide 
sequences. One or both libraries can be generated by random or semi-random mutagenesis. 
For example, random libraries of polypeptide sequences can be "crossed" with one another 

30 by simultaneous expression in the subject assay. Such embodiments can be used to identify 
novel interacting pairs of polypeptides. 

Alternatively, the subject ITS can be used to map residues of a protein involved in a 
known protein-protein interaction. Thus, for example, various forms of mutagenesis can be 
utilized to generate a combmatorial library of either bait or prey polypeptides, and the 
35 ability of the corresponding fusion protein to function in the ITS can be assayed. Mutations 
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5 which result in diminished (or potentiated) binding between the bait and prey fusion 
proteins can be detected by the level of reporter gene activity. For example, mutants of a 
particular protein which alter interaction of that protein with another protein can be 
generated and isolated from a library created, for example, by alanine scanning mutagenesis 
and the like (Ruf et al, (1994) Biochemistry 33:1565-1572; Wang et al., (1994) J. Biol. 

10 Chem. 269:3095-3099; Balint et .al., (1993) Gene 137:109-118; Grodberg et al., (1993) Eur. 
L Biochem. 218:597-601; Nagashima et aL, (1993) J, Biol. Chem. 268:2888-2892; 
Lowman et al., (1991) Biochemistry 30:10832-10838; and Cunnmgham et al., (1989) 
Science 244:1081-1085), by Unker scanning mutagenesis (Gustin et al., (1993) Virology 
193:653-660; Brown et al., (1992) MoL Cell Biol. 12:2644-2652; McKnight et al., (1982) 

15 Science 232:316); by saturation mutagenesis (Meyers et al., (1986) Science 232:613); by 
PCR mutagenesis (Leung et al., (1989) Method Cell Mol Biol 1:1 1-19); or by random 
mutagenesis (Miller et al., (1992) A Short Course in Bacterial Genetics, CSHL Press, Cold 
Spring Harbor, N.Y.; and Greener et al, (1994) Strategies in Mol Biol 7:32-34). Linker 
scanning mutagenesis, particularly in a combinatorial setting, is an attractive method for 

20 identifying truncated (bioactive) forms of a protem, e.g., to estabhsh binding domains. 

In other embodiments, the ITS can be designed for the isolation of genes encoding 
proteins which physically interact with a pitotein/drug complex. The method relies on 
detecting the reconstitution of a transcriptional activator in the presence of the drug. If the 
bait and prey fusion proteins are able to interact in a drug-dependent manner, the interaction 
25 may be detected by reporter gene expression. 

Another aspect of the present invention relates to the use of the interaction trap 
systems in the development of assays which can be used to screen for drugs which are 
either agonists or antagonists of a protein-protein interaction of therapeutic consequence 
(U.S. Patent No. 6,200,759). In a general sense, the assay evaluates the abiUty of a 
30 compound to modulate binding between the bait and prey polypeptides. Exemplary 

compounds which can be screened include peptides, nucleic acids, carbohydrates, small 
organic molecules, and natural product extract libraries, such as isolated from animals, 
plants, fimgus and/or microbes. 
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5 In many drug screening programs which test libraries of compounds and natural 

extracts, high throughput assays are desirable in order to maximize the number of 
compounds surveyed in a given period of time. The subject ITS-derived screening assays 
can be carried out in such a format, and accordingly may be used as a "primary" screen. 
Accordingly, in an exemplary screening assay of the present invention, an ITS is generated 

10 to include specific bait and prey fiision proteins known to interact, and compound(s) of 
interest. Detection and quantification of reporter gene expression provides a means for 
determining a compound's efficacy at inhibiting (or potentiating) interaction between the 
bait and prey polypeptides. In certain embodiments, the approximate efficacy of the 
compound can be assessed by generating dose response curves fi-om reporter gene 

15 expression data obtained usmg various concentrations of the test compound. Moreover, a 
control assay can also be performed to provide a baseline for comparison. In the control 
assay, expression of the reporter gene is quantitated in the absence of the test compound. 

In another exemplary embodiment, a therapeutic target devised as the bait-prey 
complex is expressed in the same cell with a peptide library with the goal of identifying 
20 peptides which potentiate or inhibit the bait-prey interaction. Many techniques are known m 
the art for expressing peptide libraries intracellularly. In one embodiment, the peptide 
library is provided as part of a chimeric thioredoxin protein, e.g., expressed as part of the 
active loop. 

In yet another embodiment, the interaction trap systems of tihe invention can be 
25 generated in the form of a diagnostic assay to detect the interaction of two proteins, e.g., 
where the gene firom one is isolated firom a biopsied cell. For instance, there are many 
instances where it is desurable to detect mutants which, while expressed at appreciable 
levels in the cell, are defective at binding other cellular proteins. Such mutants may arise, 
for example, firom fine mutations, e.g., point mutants, which may be impractical to detect 
30 by the diagnostic DNA sequencing techniques or by the immunoassays. The present 
invention accordingly fiirtfaer conteinplates diagnostic screening assays which generally 
comprise cloning one or more cDNAs firom a sample of cells, and expressing the cloned 
gene(s) as part of an ITS undCT conditions which permit detection of an interaction between 
that recombinant gene product and a target protein. Accordingly, the present invention 
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5 provides a convenient method for diagnostically detecting mutations to genes encoding 
proteins which are unable to physically interact with a "target" protein, which method relies 
on detecting the expression of the reporter gene in a bait/prey-dependent fashion as 
described above. 

As described in more detail above, in certain embodiments, the various interaction 
10 trap systems of the invention can be used to identify or optimize DNA-protem interactions. 
For example, the subject method can be used to identify mutant or composite DNA binding 
domains having desired sequence binding preferences. It can also be used to identify DNA 
sequences which are selectively bound by a given DNA binding protein and/or to determine 
the sequence specificity of a DNA binding protein. 

1 5 In another embodiment, the present invention provides a method of detecting 

protein-RNA interactions (U.S. Patent No. 5,750,667). The method begins with a host cell 
that contains a reporter gene expressing a detectable protein. The reporter gene is activated 
by an amino acid sequence including a transcriptional activation domain when the 
transcriptional activation domain is in sufBcirat proximity to the reporter gene. 

20 The host cell also contains three different chimeric genes. The first chimeric gene is 

capable of being expressed in the host cell and encodes a first hybrid protein. The first 
hybrid protem comprises a DNA-binding domain that recognizes a binding site on the 
reporter gene in the host cell and a first RNA-bLnding domain. (When we refer to an RNA' 
binding "domain", we mean an amino acid sequence that is capable of binding an RNA 

25 molecule. This domain may be a firagment of a larger protein or may comprise an entire 
protein.) 

The second chimeric gene is also enable of being expressed in the host cell and 
comprises a DNA sequence that encodes a second hybrid protein. The second hybrid 
protein comprises a transcriptional activation domain and a second RNA-binding domain. 

30 The third chimeric gene is capable of being transcribed to generate a hybrid RNA in 

the host cell. The hybrid RNA comprises a first RNA sequence capable of binding to either 
the first or second RNA-binding domain and a second RNA sequence to be tested for 
interaction with the RNA-binding domain that is not bound to the first RNA sequOTce. 
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5 Interaction between both the first RNA-binding domain and the hybrid RNA and the second 
RNA-binding domain and the hybrid RNA causes the transcriptional activation domain to 
- activate transcription of the reporter gene. 

After subjecting the host cell to conditions under which the first hybrid protein, the 
second hybrid protein, and the hybrid RNA are expressed in sufficient quantity for the 
10 reporter gene to be activated, one determines whether the reporter gene has been expressed 
to a degree greater than expression in the absence of an interaction between both the first 

RNA-binding protein and the hybrid RNA and the second RNA-binding protein and the 
hybrid RNA. If the reporter gene has been expressed to a greater degree, this indicates that 
an RNA-protein interaction has taken place. 

15 In various embodiments, either one of the RNA-binding proteins or either the first 

or second sequence of the hybrid RNA may be tested. One might have a specific RNA- 
binding protein and determine which of many different RNA sequences bound to the 
protein, or one migjit have a particular RNA sequence and determine which of many RNA- 
binding proteins bound to that specific RNA sequence. A multiplicity of proteins can be 

20 simultaneously tested to determine whether any interact with a known RNA molecule. 
Similarly, a multiplicity of RNAs can be simultaneously tested to determine whether any 
Interact with a known RNA-binding protein. 

In still other embodiments, the methods of the present invention, as described 
above, may be practiced using a kit for detecting an interaction between two proteins or a 
25 protein and a nucleic acid sequence. 

In an illustrative embodiment, a kit for detecting a protein-protein interaction 
includes two vectors, a host cell, and (optionally) a set of primers for cloning one or more 
genes encoding sample proteins firom a patient sample. The first vector may contain a 
promoter, a transcription termination signal, and other transcription and translation signals 
30 fimctionally associated with the first chimeric gene in order to direct the expression of the 
first chimeric gene. The first chimeric gene includes a DNA sequence that encodes a DNA- 
binding domain and a unique restriction site(s) for inserting a DNA sequence encoding 
either the target or sample protein, or a firagment thereof, in such a maimer that the cloned 
sequence is expressed as part of a hybrid protein with the DNA-binding domain. The first 
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5 vector also includes a means for replicating itself (e.g., an origin of replication) in the host 
cell. In preferred embodiments, the first vector also includes a first marker gene, the 
expression of which in the host cell pennits selection of cells containing the first marker 
gene &om cells that do not contain the first marker gene. Preferably, the first vector is a 
plasmid, though it may optionally be genomically integrated where the chimeric gene 
10 encodes the target protein. 

The kit also includes a second vector which contains a second chimeric gene. The 
second chimeric gene also includes a promoter and other relevant transcription and 
translation sequences to direct expression of a second chimeric protein. The second 
chimeric gene includes a DNA sequence that encodes an activation tag and a unique 
15 restriction site(s) to insert a DNA sequence encoding either the target or sample protein 
(whichever is not cloned into the first chimeric gene), in such a manner that the cloned 
protein is capable of being expressed as part of a fusion protein with the activation tag. 
Again, as appropriate, the second vector can be genomically integrated. 

In general, the kit will also be provided with one of the two vectors ahready 
20 including the target protein. 

Accordingly in using the kit, the interaction of the target protein and the sample 
protein in the host cell causes a measurably greater expression of the reporter gene than 
when the DNA-bindmg domain and the activation tag are present in the absence of an 
interaction between the two fusion proteins. The cells containing the two hybrid proteins 
25 are incubated in/on an appropriate medium and the cells are monitored for the measurable 
activity of the gene product of the reporter construct. A positive test for this activity is an 
indication that the target protein and the sample protein have interacted. Such interaction 
brings their respective DNA-binding domain and activation tag into sufficiently close 
proximity to cause efficient transcription of the reporter gene. 

30 As discussed in more detail above, a similar kit for detecting polypeptide-nucleic 

acid interactions is also encompassed in fhe invention. 
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5 Exemplification 

The invention, now being generally described, will be more readily understood by 
reference to the following examples, which are included merely for purposes of illustration 
of certain aspects and embodiments of the present invention and are not intended to limit 
the invention. 

10 

Example 1 

We have developed a bacterial "two-hybrid" system that readily allows selection 
from libraries greater than 10* m size. Our bacterial system may be used to study either 
protein-DNA or protein-protein interactions, and it offers a number of potentially 

15 significant advantages over existing yeast-based one-hybrid and two-hybrid methods. We 
tested our system by selecting zinc finger variants (fix)m a large randomized library) that 
bind tightly and specifically to desired DNA target sites. Our new method allows sequence- 
specific zinc fingers to be isolated in a siugle selection step, and thus it should be more 
rapid than phage display strategies which typically require multiple 

20 enrichment/amplification cycles. Given the large library sizes our bacterial-based selection 
system can handle, this method should provide a powerfiil tool for identifying and 
optimizing protein-DNA and proteia-protein iateractions. 

Selection and screening methods are powerful tools for studying macromolecular 
interactions. Examples of such methods include the yeast-based one-hybrid and two-hybrid 

25 systems (for studying protein-DNA and protein-protein interactions, respectively) and 
bacterial-based phage display methods (for studying either type of interaction). These 
systems have been used to identify interaction partners for particular DNA or protein 
targets, and they have also been used in combmation with mutagenesis or randomization 
strategies to study the details of biologically important interactions (for reviews, see 1-5). 

30 The development of bacterial-based systems analogous to the yeast one-hybrid and two- 
hybrid methods could, in principle, facilitate the rapid analysis of larger Hbraries (due to the 
higher transformation efficiency and faster growth rale observed with E. coli). Such 
methods might also be faster than phage display, which is an enrichment technique 
requiring multiple rounds of affinity purification and amplification (see, for example, 6). 

35 Several bacterial one- and two-hybrid systems have been proposed, but there have 

been no reports in which these actually have been used to identify candidates from a real 
library (reviewed in 7). This may reflect practical limitations with these existing systems. 



wo 01/88197 



PCTAJSOl/15718 



-112- 

5 Most of these methods are actually designed as genetic screens (8-10) and thus can not be 
readily used with hbraries greater than -10^-10^ in size. Two genetic selection systems have 
been proposed for studying protein-protein interactions, but neither method is readily 
adaptable to the analysis of protein-DNA interactions (11, 12). 

In this report we describe the design and testing of an ^'xo/z-based selection method 
10 that can detect either protein-DNA or protein-protein interactions and that can handle 
libraries larger than 1 0^ in size. We tested our new method by selecting Cys2His2 zinc 
finger variants similar to those previously isolated by phage display (6, 13). The results of 
our selection, the rapidity of our method, and the versatility of the underlying 
transcriptional activation scheme suggest that this bacterial-based system should provide a 
15 useiiil tool for identifying and characterizing protein-DNA and protein-protein interactions. 

Materials And Methods 

Selective medium. "HIS selective medium" is composed of M9 minimal medium 
supplemented with 10 |^M ZnCh, 10 fig/ml thiamine, 200 |aM adenine, 50 ng/ml 
20 carbenicillin, 30 pig/ml chloramphenicol, 30 ^ig/ml kanamycin, 50 pM IPTG, 20 mM 3.- 
aminotriazole (3-AT), and 17 amino acids (all except histidine, methionine, and cysteine). 
For HIS selective medium plates, agar was added to a final concentration of 1 .5%. 

Plasmids and bacterial strains; The aGal4 protein used in this study contains 
residues 1-248 of the E.coli RNA polymerase a subunit fiised (by an Ala- Ala-Ala hnker) to 
25 residues 58-97 of the yeast Gal4 protein. The pACYCl 84-derived plasmid pACL-aGal4 
expresses aGal4 firom a tandem, IPTG-inducible lpp/lacUV5 promoter. 

The Gall lP-Zifl23 fusion protein contains residues 263-352 of the yeast Gall IP 
protein (with a N342V mutation [14]) fused by a nine amino acid linker Ala-Ala-Ala-Pro- 
Arg-Val-Arg-Thr-Gly to residues 327-421 of Zi£268 (the region encoding the three zinc 

30 fingers). The phagemid pBR-GP-Z123 expresses the Gall lP-Zifl23 hybrid protein fi-om an 
IPTG-inducible lacUVS promoter. The pBR-GP-Z12BbsI phagemid is analogous to pBR- 
GP-Z123 except that Zif finger 3 is replaced with a modified Zif finger 1 in which the 
sequence encoding residues -1 through 6 of the finger recognition helix is replaced by 
unrelated sequence (a "stuffer" firagment) flanked by Bbsl restriction sites. All phagemids 

35 used in this study can be easily '^rescued" j&om cells by infection with a filamentous helper 
phage; infectious phage particles produced by these cells contain single-stranded phagemid 
DNA. 



wo 01/88197 



PCT/USOl/15718 



-113. 



5 The reporter construct that expresses HISS (PzirHISS-aadA) has the Zi£268 binding 

site sequence ^ GCGTGGGCG^* centered at base pair -63 relative to the transcription start 
site of a weak E, coli lac promoter derivative (the P^k promoter). The three selection strain 
reporters change the zinc finger binding site of ParHIS3-aadA, replacing the sequence 
^ TCGACAAGCGTGGGCG^' (bases -74 to -59 relative to the transcription start site) with 

10 sequences that should allow binding of the desired zinc finger variants: 

^'CAAGGGTTCAGGGGCG^' (for NRE), ^ GGCTATAAAAGGGGCG^' (for TATA), or 
^ TGGGACATGTTGGGCG^' (for p53). Each of these reporters was transferred (by 
recombination) to an F' episome encoding lacr* repressor and then introduced into strain 
KJIC in a single step essentially as previously described (IS, J.K. J. & C.O.P.» unpublished). 

15 The resulting strains were then each transformed with the pACL"aGal4 plasmid to create 
the NEUB, TATA, p53, and Zif "selection strains." 

E.coli strain KJl C, which has a deletion in the hisB gene, was constructed as 
follows: Strain SB3930 (F- AhisB463) was transduced to tetracycline resistance with Pl^"^ 
phage grown on strain JCB40 (F- h(gpt-proAB-arg'lac)XIII zaj::TnlO). Tetracycline- 
20 resistant colonies were screened for pro-, org' lac-, and his- phenotypes. 

Randomized zinc finger library. The zinc finger variant library was constructed by 
cassette mutagenesis. Randomized oligonucleotides synthesized using a two-column 
method (16) were ligated to Bbsl-digested pBR-GP-Z12BbsI vector (replacing the "stuffer" 
fragment in this phagemid) to create a library of zinc finger variants. Each member of this 

25 library has three zinc fingers: two constant fingers (fingers 1 and 2 of Zif268) and a third, 
carboxy-terminal finger (also derived from finger 1 of Zif268) in which recognition helix 
residues -1, 1, 2, 3, 5, and 6 are randomized. Our randomization scheme allows 24 possible 
codons, encoding 19 possible amino acids (no cysteine) and one stop codon. The sequence 
complexity of the resulting library is -2 x 10^. This ligation was electroporated into Kcoli 

30 XL-1 Blue cells (Stratagene) and yielded >10^ transfomiants. These were pooled, 

anq)lified, and then infected wi& VCS-M13 helper phage (Stratagene) to yield a high tit^ 
stock of phage harboring single-stranded versions of the phagemid library. 

Selection protocols. For initial selections wi& each of the dnree variant sites, >10^^ 
selection strain cells were infected with approximately 10^ ampicillin-resistance 
35 transducing units (ATU) of phage from the phagemid library. After recovery under non- 
selective conditions for 1 .5 hoiurs, infected cells were plated at a density of approximately 1 
to 5 X 10^ ampicillinr-resistant colonies/plate on 'HIS selective medium." (Control 
experiments indicated a false positive rate of -3 x 10"^ under these selection conditions.) 
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5 The largest surviving colonies were re-tested for growth on HIS selective medium plates 
supplemented with 60 l-ig/ml spectinomycin (we chose 80-90 colonies for the NEiE and 
TATA selections and 240 colonies for the p53 selection). Candidates that re-grew on these 
plates were then chosen for phagemid-Iinkage testing. 

The second NRE selection was performed in two stages, in an attempt to isolate 
10 additional variants. In the first stage, >10^^ NRE selection strain cells were infected with ~6 
X 10^ ATU of phage from the phagemid library. After recovery under non-selective 
conditions, the infection was plated at a density of --6x10^ ampicillin-resistant 
colonies/plate on HIS selective medium. Half of the --900 surviving colonies were pooled 
and amplified in liquid HIS selective medium supplemented with 50 |ag/ml spectinomycin. 
1 5 This pooled culture was infected with VCS-Ml 3 helper phage, grown ovemigiht in 2x YT 
medium supplemented with 50 jig/ml spectinomycin, and a high titer stock of phage was 
isolated. For the second stage, fresh NRE selection cells were infected with phage 
containing the enriched library of phagemids (from the first stage), and these were plated on 
HIS selective medium plates. Twenty-four surviving colonies of various sizes were re- 
20 tested for growth on HIS selective medium plates (supplemented with 60 jig/ml 
spectinomycin) and these were then checked for phagenfiid-linkage. 

Phagemid-Iinkage testing. Colonies that grew on HIS selective medium were then 
tested to see whether survival was phagemid-linked. Candidates were inoculated into liquid 
HIS selective medium supplemented with 100 |ig/ml spectinomycin (but lacking 3- AT). All 

25 of the NRE and TATA selection candidates, and the 72 fastest growing p53 selection 
candidates, were each infected with VCS-M13 helper phage, and the resulting phage- 
containing supematants were harvested. Each candidate phage was used to infect fresh 
selection strain cells (correspondmg to those on which it was originally selected), and these 
infected cells were plated on HIS selective medium. Growth imder these conditions 

30 demonstrates that activation of HIS3 expression is linked to the presence of the phagemid 
(and thus suggests tbat the phagemid-encoded zinc fingers bind to the DNA target site on 
which they were selected). 

Binding site preference testing. To test the abiUty of the selected zmc fingers to 
discriminate among different binding sites, recovered phagemids were introduced (by 
35 phage infection) into NRE, p53, TATA, and Zif selection strain cells. Infected cells were 
plated on HIS selective medium and grovrth scored qualitatively after 24 hours growth at 
37°C and 18 hours continued growth at room temperature. Under these conditions, we have 
found that smrvival of a selection strain iadicates that the variant finger can bind the target 
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5 subsite present on the reporter. If a zinc finger variant permits selection strains (other than 
the one in which it was initially isolated) to survive on selective medium, this suggests that 
the variant finger binds semi- or non-specifically. 

Sequencing of candidates. To prepare candidates for sequencing, the phage stocks 
of clones with a phagemid-linked phenotype were used to infect XL-1 Blue cells. Plasmid 
10 DNA was isolated firom these cells (QIAgen) and used for dideoxy sequencing. 

Resnlts 

An improved £.coli-hsised "two-hybrid" selection system for studying protein- 
DNA and protein-protein interactions. To design a bacterial-based selection method for 

15 studymg protein-DNA and protein-protein interactions, we began with an existing genetic 
screen previously developed by Hochschild and colleagues (7, 8, 10). In this screen, as in 
the yeast '*two-hybrid" system, there are two fiision proteins that interact in a way that leads 
to transcriptional activation of a lacZ reporter gene (Figure 1 A). One protein is composed 
of a DNA binding domain (DBD) fused to another domain represented as X ia Figure 1 A. 

20 The second protein contains the domain Y fused to a subunit of the E. coli RNA 

polymerase. In this arrangement, activation of lacZ expression requires appropriate protein- 
DNA and protein-protein interactions: The DBD must bind to a DNA binding site (DBS) 
positioned near the promoter, and domain X must simultaneously interact with domain Y to 
recruit RNA polymerase to the promoter, thereby activating transcription. The major 

25 advantage of this system is that ahnost any proteui-DNA (DBD-DBS) or protein-protem 
(X-Y) interaction should mediate transcriptional activation. However, because lacZ is used 
as a reporter gene in this system, candidates must be identified by a visual phenotype 
(e.g. — ^their blue color on X-gal plates). Thus, the system (in this form) can not readily be 
used to screen libraries larger than -10^-10^ in size. 

30 To improve this previously described system so that it can be used to analyze 

libraries greater than 10^ in size, we replaced the lacZ gene used in the Hochschild genetic 
screen with the selectable yeast HIS3 gene (Figure IB). HIS3 encodes an enzyme required 
for histidine biosynthesis that can complement the growth defect of E.coli cells bearing a 
deletion in the homologous hisB gene {AhisB cells) (17, 18). In addition, 3-aminotriazole 

35 (3-AT), which is a competitive inhibitor of HISS, can be used to titrate the level of HISB 
expression required for growth on medium lacking histidine (19). (Thus, m the presence of 
3-AT, a higher level of activation is required to allow growth on selective medium.) We 
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5 find that HISS is attractive for use with large libraries since: 1) >10 AhisB cells harboring a 
HIS3 gene expressed from the P^k promoter can be plated on a regular-size Petri dish 
containing HIS selective medium, and 2) we find that these cells have a very low false 
positive rate (about 3 x 10"^) on HIS selective medium (data not shown). 

Our modified construct also contains the bacterial oadA gene (which confers 
10 resistance to the antibiotic spectinoraycin) (20) positioned just downstream of the HISS 
gene (Figure IB). We refer to this construct as the Pwk-HIS3-aadA operon because P^k 
directs coordinated expression of the HISi and aadA genes (data not shown). Although 
selection for increased aadA expression is not suitable for direct analysis of large libraries 
(we find this allows a relatively high background breakthrough [data not shown]), we used 
15 spectinomycin in certain steps to maintain selective pressure (see Materials and Methods). 
In addition, we also constructed reporter strains which harbor a lacZ gene positioned just 
downstream of the HIS3 gene. In this synthetic operon, Pwk directs coordinated expression 
of the HIS3 and lacZ genes. In this configuration, basal expression of lacZ is low and thus 
cells harboring this reporter construct form white colonies on X-gal-containing medimn 
20 (data not shown). 

Zinc finger domains can bind DNA and activate transcription in E.colL We 
tested our new £.co/2-based system by applying it to a problem previously studied using 
phage display: the selection, from a large randomized library, of zinc finger variants with 
altered DNA binding specificities (for review, see 21). However, before proceeding with 

25 selections, we first examined whether a wild-type zinc finger protein could bind DNA and 
activate transcription in our system. (Relatively little information was available on the 
activity of Cys2His2 zinc finger proteios in bacteria.) To do this, we constracted fiision 
proteins containing firagments of the yeast Gall IP and Gal4 proteins that had previously 
been shown to interact with each other (10, 14). Thus, we fiised a Gall IP fiagment to the 

30 three zmc fingers of the murine Zif268 protein (creating the Gall lP-Zifl23 protein), and 
we replaced the caiboxy-terminal domain of the E.coli RNA polymerase a subunit with a 
Gal4 firagment (creating the chimeric aGal4 protein). A Zi£268 DNA binding site was 
positioned upstream of our Pwk-HIS3-aadA operon to create the PzifHIS3-aadA operon 
(Figure IC), and this cassette was introduced into a MisB E.coli strain in single copy to 

35 create the "Zif reporter strain." 

We then tested whether the Gall lP-Zifl23 and aGal4 proteins could work together 
as a "two-hybrid" system to activate transcription of the Pzir-HIS3-aadA operon. We find 
that Zif reporter strain cells expressing only the aGal4 protein do not grow on HIS selective 
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5 medium, but the same cells can grow when the Gall lP-Zifl23 protein is expressed together 
with the aGal4 protein. We also find that activation requires all three ZiE68 fingers: a 
Gall IP fusion protein which contains only the first two zinc fingers fi-om Zif268 does not 
permit growth on selective medium. We performed similar experiments using reporter cells 
harboring the PwirHIS3-lacZ operon and obtained similar results (data not shown) on HIS 

10 selective medium. In addition, cells harboring the HIS3-lacZ operon in which the promoter 
is activated by the Gall lP-Zifl23/aGal4 interaction form blue colonies on X-gal medium, 
indicating increased expression of the lacZ reporter gene. These results indicate that the 
Gall lP-Zifl23 and aGal4 proteins can work together to activate transcription in our E.coli 
system. We presume that the DNA-bound Gall lP-Zifl23 acts by recruiting (or stabilizing) 

15 RNA polymerase complexes that have incorporated aGal4. These results also give some 
information about the DNA-affinity threshold for activation since we find that fingers 1 and 
2 of Zif268 alone are not sufficient. 

Selection strategy for isolation of zinc finger variants. Since our initial results 
indicated that zinc fingers could function in E, coli and that our activation scheme worked 

20 as expected, we proceeded to test our system by isolating zinc finger variants firom a large 
randomized hbrary. We chose target DNA subites that had been used in an earlier phage 
display study (6, 13). This previous study had involved selecting zinc finger variants that 
would bind to sequences normally recognized by important eukaryotic DNA-binding 
.proteins. The AAA target subsite used in our experiments is part of a TATA box, the TGT 

25 target subsite is part of a p53 binding site, and the TCA target subsite is part of a nuclear 
receptor element (NRE). We refer to these sequences as the 'TATA," *'p53 " and 'mE*' 
target subsites. 

Our strategy for identifying variant zinc fingers that bind specifically to a particular 
•target" DNA subite relies on the ability of our system to distinguish between zinc finger 

30 proteins that bind usmg two fingers (recognizing 6-7 bp) fix)m those that bind using three 
fingers (recognizmg 9-10 bp). We synthesized a large library of Ihiee-finger Zi£268- 
derivatives (each fiised to the Gall IP firagment). In this library, the first two fingers of 
Zif268 remain constant, but the recognition helix of the tiiird, carfooxy-tenninal finger is 
randomized (see Materials and Methods). We also prepared "selection strains" with the 

35 appropriate zinc finger bmding sites upstream of the Pwk-HIS3-aadA operon. (Each of 
these has the normal binding subites for fingers 1 and 2 of Zif268, but the third subsite 
[black notched rectangle, Figure 2] is changed to include the 'target" DNA subsites of 
interest [AAA for TATA; TGT for p53; TCA for NRE].) Each of these MisB selection 
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5 strains also contain a plasmid expressing the aGal4 protein, and these bacteria are referred 
to as the TATA, p53, and NRE selection strains. (As a control for use in binding site 
specificity studies [see below], we also constructed a correspondmg "Zif selection strain" 
that has an intact Zif268 binding site [containing subsites for all three Zif268 fingers] 
positioned upstream of the P^k promoter.) 

10 To perform a selection with one of these three target subsites, we introduced >5 x 

10^ members of the phagemid library into the appropriate selection strain and plated the 
cells on HIS selective medium. From our earlier controls, we expected that growth would 
require three functional fingers; thus, a cell should survive only if it happens to express a 
protein with a finger that binds tightly to the target subsite (Figure 2). 

15 Positive candidates identified on HIS selective medium were then checked in 

several ways: Each candidate was first tested to verify that the phenotype of growth on 
selective medium was hnked to the phagemid encoding the zinc finger library candidate 
(phagemid-linkage test, see Materials and Methods). Clones that still appeared positive 
were then tested to see how well they distinguish among the NKE, TATA, p53, and Zif 

20 subsites (binding site preference test, see Materials and Methods). Finally, clones were 
sequenced to detennine which amino acids were preferred at the positions that had been 
randomized. 

Selection of Zinc Fingers that bind the TATA Target Subsite. From the ~5 x 10^ 
zinc finger variants introduced into the TATA selection strain, we identified 50 candidates 

25 with a phagemid-linked phenotype. Based on their ability to distinguish among the TATA, 
p53, NRE and Zif subsites, these candidates can be categorized into three groups. Group I 
candidates bind specifically to the TATA target subsite. Group n candidates bind semi- 
specifically (with a strong preference for the TATA subsite over the Zif subsite); Group III 
candidates bind non-specifically to all four subsites tested (with a preference for the Zif and 

30 p53 subsites over the TATA and NRE subsites). Amino acid sequences are shown in 

Figures 3A (Groups I and II) and 3D (Group HI) and reveal striking conserved patterns for 
each of the groups. 

Selection of Zinc Fingers that bind the p53 Target Subsite. From the -1.3 x 10^ 
zinc finger variants introduced into the p53 selection strain, we identified 43 candidates that 
35 demonstrate a phagemid-linked phenotype. Based on their ability to distinguish among the 
four different subsites, these candidates can be categorized into three groups. Group I 
candidates bind specifically to the p53 target subsite. Group n candidates bind semi- 
specifically (with a general preference for the p53 subsite over flie Zif subsite); Group m 
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5 candidates bind non-specifically to all four subsites tested (again with a slight preference 
for the Zif and p53 subsites over the TATA and NRE subsites). The amino acid sequences 
of the recognition helices of these candidates are shown in Figures 3B (Groups I and H) and 
3D (Group m). Striking patterns of conserved residues are seen in each group. 

Selection of Zinc Fingers that bind the NRE Target Site. ~2 x 10^ zinc finger 
10 variants were introduced into the NRE selection strain, and we obtained two candidates that 
demonstrated a phagenaid-Iinked phenotype. One candidate binds specifically to the NRE 
target subsite (and also exhibits very weak binding to the TATA subsite). The second 
candidate binds non-specifically to all four subsites tested (with a preference for the Zif and 
p53 subsites over the NRE and TATA subsites). We selected a finger with a similar 
15 recognition helix sequence using reporter cells harboring the Pwk-HIS3-lacZ operon (data 
not shown). 

To isolate additional clones that recognize the NRE subsite, we performed a 
modijQed two-stage selection procedure. In the first stage, we repeated the selection for the 
NRE subsite and pooled 50% of the surviving colonies (approximately 450 candidates). In 

20 the second stage, finger-encoding phagemids isolated firom this enriched pool (see Materials 
and Methods) were then re-introduced into the NRE selection strain and plated again on 
selective medium. All 24 colonies chosen for further analysis displayed a phagemid-linked 
phenotype, and these zinc fingers could be categorized into two groups on the basis of their 
observed specificities. Group I sequences bind well to the target NRE subsite (with very 

25 weak binding to the TATA subsite). Group in candidates bind non-specifically to all four 
subsites tested (with a preference for the Zif and p53 subsites over the NRE and TATA 
subsites). The recognition helix sequences of all of the selected candidates are shown in 
Figures 3C (Group I) and 3D (Group IE). As with our other selections, striking patterns of 
conserved residues are observed m each of these groups. 

30 

Discussion 

Selection of variant zinc fingers with altered DNA-binding specificities using a 
bacterial-based selection method. Our bacterial-based selection system is designed to 
rapidly identify and characterize protein-DNA and protein-protein interactions. To test our 
35 method, we performed selections to identify variant zinc fingers that would bind selectively 
to desired target DNA subsites. We discuss these results in some detail in the following 
paragraphs, but our main observation is that the affinity and specificity of the selected 
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5 fingers seems comparable, if not superior, to those obtained in earlier phage display studies 
(which required multiple rounds of selection and amplification). 

For the TATA selection, subsite-specific fingers identified by our method (TATA 
Group I) define two consensus sequences, and these closely match the two consensus 
sequences observed in fingers isolated by phage display (Figure 3 A), However, the 

10 randomization scheme used in constructing our library allowed aromatic amino acids (Phe, 
Tyr and Trp) that were not represented in the codon scheme used for the corresponding 
phage display library (6, 13). One consensus sequence obtained with our selection appears 
to specify an aromatic residue at position 5 of the recognition helix (NSGA0N, where 9 is 
an aromatic residue). The corresponding phage display-derived consensus (NSGA_N) does 

15 not define any particular class of residues at this position* Our selection also yielded 

another class of fingers that appear to be semi-specific for the TATA subsite (TATA Group 
n fingers). The sequences of these fingers also match one of the phage display consensus 
sequences, but all (except one) of these semi-specific fingers are distinguishable fixim the 
specific fingers (TATA Group I) by the presence of either an asparagine at position 5 or a 

20 positively charged residue at position 6 (Figure 3 A). Thus, the results for this subsite are 
quite clear: our selection yields fingers that bind specifically to the TATA subsite, and the 
sequences of these fingers match well with those isolated by phage display. 

For the p53 selection, we isolated a number of fingers that bind specifically to the 
intended target subsite (p53 Group 1). The recognition helix sequences of two of these 

25 fingers match the consensus sequence of those obtained by phage display (Figure 3B), We 
note that the remaining p53 Group I fingers have an aromatic residue at either position -1 
or 2 of the recognition helix and thus would not have been present in libraries used for 
earlier phage display experiments. In addition, fingers isolated by our method that bind 
semi-specifically to the p53 subsite (p53 Group n fingers) all possess a tryptophan at 

30 position 2. Although the nature of some of the sequence-specific contacts made by these 
fingers is unclear, the conservation of specific aromatic residues at certain positions 
suggests an important role in DNA recognition. Again, our results with this subsite are very 
encouraging: our selection yields a number of fingers that bind specifically to the p53 target 
subsite. Some of these fingers match the consensus obtained by phage display while others 

35 suggest that aromatic residues may play an important role in zinc finger-DNA recognition. 

For the NRE target subsite, an initial attempt using our new selection method 
yielded only one finger (NSGSWK) that bound preferentially to the target sequence. Based 
on our existing knowledge of zinc finger-DNA recognition (reviewed in 21), one can 
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5 postulate reasonable contacts between recognition helix residues of this finger and bases in 
the primary strand of the NRE subsite (Figure 3C), However, we were initially concerned 
by the relatively low frequency of fmgers selected for this site, and we repeated the 
selection using an additional enrichment step in an attempt to isolate more fingers. The 
great majority of sequences isolated this way had the same amino acid sequence as the 

10 candidate originally selected O^fSGSWK) but two closely related sequences (NSGSIPC and 
NHGS WK) were also identified. These results suggested that we might have obtained a 
small number of clones merely because very few candidates in our library can pass the 
threshold set in our NRE selection. 

As shown in Figure 3C, the sequences of fingers isolated in our NRE selections do 

15 iiot match the consensus sequence for fingers selected by phage display. We performed 
several experiments to explore the basis of this difference: We first checked our library by 
sequencing random candidates to ensure that there was no drastic bias in nucleotide 
distribution and were able to rule this out as a plausible explanation (unpublished data). We 
then decided to directly introduce (in exactly the same context) one of the fingers 

20 (TRTNKS) that had been selected by phage display (6) and see whether it could work in 
our system as a Gall IP-zinc finger fiision protein. We find that NRE selection strain cells 
expressing this TRTNKS finger fusion protein grow very poorly on HIS selective medium 
whereas the same cells expressing the NSGSWK finger fusion (obtained in our selections) 
grow robustly (unpublished data). The simplest explanation for this result is that the 

25 TRTNKS finger fusion binds poorly to the NRE subsite and therefore only weakly 

stimulates HISS expression. This explanation is supported by our observation that earlier 
selections with the NRE subsite, using a prototype of our system in which zinc fingers were 
expressed from a much higher copy number phagemid, had yielded the TRTNKS as well as 
the NSGSWK finger (J.K J. and C.O.P., unpublished data). This suggests that our current 

30 system sets a very stringent standard for the NRE selections and may account for why we 
isolated such a small number of specific candidates. 

We also used our binding site preference assay to compare the specificity of the 
NSGSWK finger we had selected for the NRE subsite with that of the TRTNKS finger 
selected by phage display. In our bacterial-based assays, the NSGSWK finger binds 
35 specifically to the NRE subsite and binds only very weakly to the TATA subsite. By 

contrast, the TRTNKS finger binds only weakly to all four subsites (exhibiting a preference 
for the NRE and TATA subsites over the p53 and Zif subsites) (unpublished data). These 
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5 results suggest that the NSGSWK finger we selected actually binds more tightly and 
specifically in our system than the TRTNKS finger identified earlier by phage display. 

Each of our three selections also yielded a small percentage of fingers that bind non- 
specifically to all four DNA subsites tested. Surprisingly, all of these fingers match a 
consensus sequence of the form R+WL+L (where + denotes a positively charged residue, 

10 Figure 3D). These fingers are rich in positive charge and may make extra phosphate 

contacts. We also note that all of these fingers have a tryptophan residue at position 2 and 
thus would not have been present in the libraries used for earlier phage display experiments. 
This highly conserved set of non-specific fingers raises many interesting questions: What 
level of specificity is required for a zinc finger protein to function in our assay (and thus to 

15 what extent does the E. coli chromosome fimction as a non-specific competitor)? How do 
these fingers bind? Why is this particular class of non-specific fingers liie only type selected 
in our system? 

hi summary, the TATA and p53 subsite selections demonstrate that our bacterial- 
based system can isolate fingers similar to those obtained previously by phage display. 

20 Only a few fingers were obtained in the NRE subsite selections, but it appears that these 
may actually bind with better affinity and specificity than those obtained by phage display. 
Most significantly, we believe our new method offers a more rapid alternative to phage 
display because it permits fimctional fingers to be isolated in a single selection step instead 
of using multiple rounds of enrichment. We also note that (as with recent phage display 

25 efforts from this lab and other laboratories) we took no special precautions to perform our 
selections in an anaerobic environment. We envision that our rapid bacterial-based system 
will be particularly useful for projects requiring multiple zinc finger selections (performed 
either in parallel or sequentially). 

General strategies for studying protein-DNA and protein-protein interactions 
30 utilizing our bacterial-based two-hybrid selection system. This report demonstrates that our 
bacterial-based system can be used in a manner analogous to the yeast one-hybrid method 
to identify variant zinc fing^ that bind to a specific DNA subsite. We have also found that 
a number of other eukaryotic DNA binding domains can readily function in our system (J, 
Miller, J. Kanter, J.K. J., E!LR., and C.O.P., unpublished results). Thus, we expect that our 
35 method could also be readily used to identify DNA-binding proteins from cDNA libraries 
or random peptide libraries. 

With a few minor modifications, our selection method could also be used to identify 
and study protein-protein or protem-peptide interactions. In this application (analogous to 
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5 the yeast two-hybrid method), the protein target (the "bait" or domain Y in Figures 1 A and 
IB) could be fused to either the dimeric a subunit or to the monomeric co subunit of RNA 
polymerase. The protein or peptide library to be analyzed (the ''pre/' or domain X in 
Figures 1 A and IB) could be fused to either a dimeric (e.g. — ^bacteriophage^cl protein) or 
monomeric (e.g. — ^Zif268) DNA binding domain. (Previous experiments have shown that 

10 different mteracting proteins X and Y can effect transcriptional activation and that the 
magnitude of this activation correlates well with the strength of the X-Y interaction 
[reviewed in 22].) The reporter in this application would be the Pwk"HIS3-aadA operon 
bearing an iq)stream binding site for the particular DBD used in the experiment. As with 
other applications of our system, the phagemid rescue feature simplifies and reduces the 

15 time required to analyze plasmid linkage and to test interaction specificity. 

Our bacterial-based selection system offers a number of potentially significant 
advantages over analogous yeast-based one-hybrid and two-hybrid methods (reviewed in 
7). In particular it offers: the ability to analyze libraries larger than 10^ in size, faster growth 
rate, greater potential permeability to small molecules (23), the absence of a requirement 

20 for nuclear locahzation, and the possibihty of studying proteins that are toxic when 

expressed in yeast. To our knowledge, this report is the first description of a bacterial-based 
"two hybrid" system that has actually been used to identify candidates of interest fi-om a 
large Ubrary (>10^ in size). Our HIS3-based system provides a rapid selection method with 
a low false positive rate, and it can easily be titrated to be more or less stringent simply by 

25 varying the concentration of 3-AT inhibitor in tiie medium. Our method is also amenable to 
high-throughput analysis and automation, as many steps are performed in a 96-well format. 
We envision that our genetic selection method will provide a powerfiil, broadly applicable 
tool for identifying and characterizing both protein-DNA and protein-protem interactions. 

30 Table 1. Effects of fusion proteins on HISS expression from the P^f promoter 

"Zif reporter strain** cells (see text) expressing the indicated fixsion proteins were tested for 
growth on HIS selective medium. 



Fusion proteins expressed 



Growth on HIS Selective Medium 



35 



Gal4 only 



No growth 



GalllP-Zifl23 and aGal4 



Growth 
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5 Gall lP-Zifl2 and aGal4 No Growth 
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Example 2 

In order to determine if bacterial cells can be sorted by FACS according to the 
methods of the present invention, we first tested the behavior of several different 
1 5 fluorescent proteins in our system. 

We originally tried the promoter constmcts described in Example 1 above with 
EGFP as the reporter gene, but decided that a stronger signal would be more useful. We 
placed the reporter construct on a low copy number pi 5 A origin / chloramphenicol resistant 
plasmid rather than the single copy F factor. We then cloned the alpha-gal4 fusion and its 
20 lpp-UV5 promoter onto a low copy number plasmid with the RK2 origin and tetracycline 
resistance. 

plasmid origin copies per cell inducer antibiotic resistance 

reporter pl5A 20-30 N/A chloramphenicol 

alpha-ga]4 RK2 ~10 IPTG tetracycline 

25 DBD-galllP ColEl 50-70 IPTG ampicilin 

As Figure 4 illustrates, discernible differences in fluoreiscence of the host cells can 
be detected between a bait protein that binds the DNA site tigjitly (Z121) versus a bait 
protein that does not bind the DNA site tightly (Z12). We tested green fluorescent protein 
mut 3.1 (GFP 3.1), enhanced green fluorescent protein (EGFP), enhanced yellow 
30 fluorescent protein (EYFP) and red fluorescence protein (dsRed). In another experiment, 
similar results were obtained using Renilla reniformis GFP and GFPmut2 as the reporters. 
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5 Figure 5 illustrates that interacting pairs can be isolated from a library using FACS. 

Approximately 200,000 cells from a mixed culture containing one "positive'' cell for every 
10,000 "negative" cells were sorted using a Becton Dickinson FACStar plus. Nine cells 
that were selected based on a high EGFP signal were cultured and analyzed by PGR, True 
positive cells should yield a PGR product of approximately 450 basepairs in size (positive 
10 control, lane 3). True negative cells should yield a PGR product of approximately 358 
basepairs in size (negative control, lane 2). DNA size markers are in the control lane 
marked M. Eight of the nine clones appear to be true positives. 

In the embodiment described above, both the alpha-gal4 and the zinc-finger-Gall IP 
fusion proteins are induced by the same chemical (IPTG). Accordingly, the concentrations 

15 of the two proteins can not be varied independently. In order to build a system where the 
concentrations of these proteins can be varied independently, we obtained several plasmids 
from Herman Bujard's lab in Germany that make it easy to swap origins, antibiotic 
resistance genes, and promoters between plasmids, and we made and tested a nxunber of 
different combinations of alpha-Gal4, reporter constructs, and zinc-finger Gall IP fusion 

20 with different plasmid origins under the control of different promoters. The setup that gave 
the best results uses our previous reporter constmct (on the pl 5A origin plasmid), has the 
zinc-finger-Gall IP fusion under the control of the pLlacO-1 promoter (ff TG inducible), on 
a plasmid with the GolEl origin and Ampicillin resistance (pZE12), and has the alpha-Gal4 
fusion under the control of the pLtetOl promoter (inducible by anhydrotetracycline- aTc) 

25 on a plasmid with the low copy number pSGlOl origin and Kanamycin resistance (pZS21). 
With the proper concentrations of inducers (IPTG and aTc), we have seen up to 27 fold 
activation. This ability to independently control expression of the fusion proteins should 
make the system much more powerful since we can keep the alpha-Gal4 concentration at 
the optimal level while adjusting the protein level to a concentration that is appropriate for 

30 the aflBbaity and specificity of the particular protein under study. For example, in an 

embodiment where directed evolution is used through subsequent rounds of isolation, one 
could start out with hig^ protein expression in the early rounds and then lower the protein 
expression in subsequent rounds as the evolved proteins became better and better at binding 
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5 the target site tightly and specifically (and this could be done without lowering the alpha- 
gal4 concentration). 

plasmid origin conies per cell inducer antibiotic resistance 

reporter pi 5 A 20-30 N/A chloramphenicol 

alpha-gal4 pSClOl -10-12 aTc kanamycin 

10 DBD-galllP ColEl 50-70 DPTG ampicilin 

Figure 6 shows preliminary data utilizing this embodiment of the system. It appears 
that the system is especially dependent on the concentration of aTc in the media.The 
fluorescence of all the samples are normalized with respect to sample #1 (which has the 

15 lowest concentration of IPTG and aTc). The cells are E. Coli of the strain DH5alpha-Zl 
and were grown for 24 hours at 30**C in minimal media (as described in Example 1, except 
the media had all 20 amino acids, contains 50 mM HEPES at pH 7.5, chloramphenicol, 
kanamycin, ampicilin, and the indicated concentrations of IPTG and aTc). The cells were 
then spun down and resuspended in PBS (phosphate bufS^ed saline) immediately prior to 

20 measurement. The samples were measured on a Becton Dickinson FacScan flow cytometer 
with the standard argonion laser (488 nm emission), the standard set of optical filt^, and 
the EGFP signal measured in channel one. This is sintular to the protocol used for cells 
expressing dsRed except that to get an optimal signal the cells have to be grown at room 
temperature for 48 hours in standard LB, or grown for 48 hours at 30°C in minimal media 

25 with 10 g/1 caseamino acids. 



Example 3 

The ability to simultaneously and independently monitor the interaction of a single 
DNA-binding protein with multiple DNA binding sites within a single cell could be very 
30 useful for selecting proteuis with diflfereritial activation at distinct DNA bmding sites. 

Sq)arate reporter constracts, each with a separate DNA binding site driving the expression 
of a reporter gene that encodes a fluorescent protein with unique fluorescent properties, is 
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5 one way to achieve this goal. In order to create such a system using the bacterial two- 
hybrid ITS, we decided to use EGFP and dsRed (RPP) as the two reporter genes since they 
have different fluorescent emission spectra, but can both be excited by the argon ion lasers 
(A,=488 mn) commonly found in FACS machines. The first reporter construct has a binding 
site for the Zif268 protein, a minimal pLac promoter and EGFP as the reporter gene. The 
10 second reporter construct has a binding site for the Tl 1 protein (a protein selected as part of 
Example 1), uses a hybrid promoter consisting of the X pRM promoter with its -35 region 
rq>laced with the -35 region of the pLac promoter, and has dsRed as the reporter gene. The 
sequence of the pLac promoter is: 

CTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTCGA (SEQ ID NO: 2) and the 
15 sequence of the hybrid promoter is: CTTTACAATTTATCCCTTGGTC 
GGCTAGATTTACTCGAG (SEQ ID NO: 3). 

To facihtate the introduction of both reporter constructs into the host cell and to 
insure equal quantities of each reporter construct within a given host cell, both reporter 
constructs were placed on a single two-color reporter plasmid. The orientation of the key 

20 parts ofthe two reporter constructs with respect to each other is shown in figure 7. As 
indicated in figure 7, the two reporter genes are transcribed in different directions and are 
thus encoded by opposite strands of DNA; this ensures that transcriptional "read-through" 
of one reporter gene will not erroneously affect the expression of the other reporter gene. It 
is also important to insure that the plasmids are designed in such a way that transcriptional 

25 "read-through" doesn't interfere with the plasmid ori^ of replication or with expression of 
the antibiotic resistance gene. 

In order to test how well this reporter construct fimctions in the two hybrid system, 
the two-color plasmid containing both reporter constructs was introduced into host cells 
with the a-Gal4 fusion protein and one of three Gall IP-zinc finger fusions: [Gall Ip- 
30 Zif268, which should interact only with the Zif268 binding site; Gall Ip-Tl 1, which should 
interact only the Tl 1 site; and Gall 1P-Z12, which should interact with neither binding site]. 
Overnight cultures of the host cells containing the two color reporter plasmid and the 
appropriate fiision proteins were grown in LB media on a rotating drum incubator at 37° C 
and 10 p.1 of these saturated cultures were used to inoculate 3 ml cultures of minimal media 
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5 (as described in example 2, except with 10 g/1 caseaminoacids) containing 10 ng/ml aTc 
and 100 jiM IPTG. These cultures were incubated at 30°C on a rotating drum incubator for 
48 hours and then the cultures were diluted 100 fold in Phosphate Buffered Saline (PBS) 
and measured on a Becton Dickinson FACScan flow cytometer. The results from each of 
these three separate experiments are shown in figure 8. The data for each experiment is 

10 presented as a dot plot were each dot indicates the amount of EGFP and dsRed (REP) signal 
for a single cell by its position with respect to the X and Y axis. The data for 1000 
individual cells is shown for each experiment. The regions Rl and R2 are drawn in the 
identical position on all three plots to allow for easy comparisons between the experiments. 
This data shows that cells containing a bait protem that interacts with only the first DNA 

15 site and cells that contain a bait protein that interacts with only the second DNA site can be 
easily separated from each other and from cells containing bait proteins that interact with 
neither DNA site. 

Preliminary results using this embodiment of the two-color flow ITS system to 
select a partially randomized zinc finger, from a library of approximate 2 x 10^ members, 

20 with a preference between two similar DNA sites are encouraging and a least one selected 
clone shows a statistically significant differential activation in favor of the desired site. 
Two sequential rounds of sorting were required to isolate positive clones in this experiment. 
A population of host cells containing the library of randomized 2dnc fingers was sorted to 
obtain cells with the desired amount of EGFP and RFP expression . This pool of selected 

25 cells was then amplified and the resulting population of cells was sorted a second time. In 
our current versions of both the one and two color flow ITS, multiple rounds of sorting 
appear to be necessary when sorting for rare clones (<1 positive per 10^ negatives) smce 
there is enough variation in fluorescence among individual, genetically identical cells to 
allow a small proportion of genetically negative cells (i.e. cells without a bait protem that 

30 interacts with flie desired DNA binding site) to have a fluorescent signal that is similar to 
the signal of the average genetically positive cell. 

Example 4 
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5 In addition to selecting proteins that bind to a specific DBS, this bacterial ITS can 

also be used to select DBS's that interact with a specific protein. Figure 9 shows the results 
for such an in vivo site selection to select DNA sequences that interact with the P53^ 
protein. The consraisus binding site, as detennined by Wolfe et. al., JMB 285, pl917-1934 
(1999), for the P53^protem is CXGGACACSTX where X indicates no clear sequence 

10 preference at that position. A library of EGFP reporter plasmids containing the partially 
randonaized binding site CGGG ANNNNNG was created (where N indicates a mixture of 
A, G, C, T) and introduced into host cells containing the a-Gal4 and Gall lp-P53^fiision 
proteins. These cells were then grown to saturation at 37°C in LB media with the 
appropriate antibiotics and then 100 \xl of this culture was used to inoculate 10 ml of 

15 minimal media (as described in example 3) containing 10 ng/ml aTc and 100 \xM EPTG. 
These cultures were then incubated for 24 hours at 30° C on a rotating drum incubator. 
After incubation, one round of FACS sorting was performed on a Cytomation MoFlo 
multiple laser FACS sorter and individual EGFP positive clones were selected. Of 20 
clones analyzed, 16 were EGFP positive (i.e. expressed at least 2 fold more EGFP than 

20 control cells). These 16 positive clones contained three unique P53^^ binding sites. The 
most abundant of these sites matched the consensus fi-om the in vitro site selection. 

In order to compare the in vivo interaction between the P53^ protein and each of the 
three selected sites, reporter plasmids containing each of the three selected sites was 
introduced into host cells containing either the Gall lp-P53^fiision protein or the Gall IP- 
25 only control protein (i.e. Gall IP without an attached DBD). Dividing the mean EGFP 

fluorescence of the Gall lp-P53^containing cells by the mean fluorescence of the otherwise 
identical Gall Ip-only cells gives the fold-activation for each site reported in the figure. 
Four clones were also picked at random from the library and all of these clones had less 
than two-fold activation. 



30 
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5 

Equivalents 

The present invention provides among other things novel methods and compositions 
for interaction trap assays. While specific embodiments of the subject invention have been 
discussed, the above specification is illustrative and not restrictive. Many variations of the 
10 invention will become apparent to those skilled in the art upon review of this specification. 
The appended claims are not intended to claim all such embodimoits and variations, and 
the full scope of the invention should be determined by reference to the claim, along with 
its full scope of equivalents, and the specification, along with such variations. 

All publications and patents mentioned herein, including those items listed below, 
15 are hereby mcorporated by reference in their entirety as if each individual publication or 
patent was specifically and individuaUy indicated to be incorporated by reference, hi case 
of conflict, the present application, including any definitions herein, will control. 
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We Claim: 

1 . A method for selecting an interacting pair of test polypeptides, comprising; 

i providing a population of prokaryotic host cells wherein each host cell 
contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 

(b) a first chimeric gene which encodes a first fusion protein, the first 
fusion protein including a DNA-binding domain and a first test 
polypeptide, 

(c) a second chimeric gene which encodes a second fusion protein, the 
second fusion protein including an activation tag and second test 
polypeptide, 

wherein the first fusion protein is part of a library of at least 10^ members, the 
second fusion protein is part of a library of at least lO'^ members, or the first and second 
fusion proteins are both members of a library such that at least 10^ unique pairs of test 
polypeptides could be tested for interaction; 

wherein interaction of a first fusion protein and a second fusion protein in a host cell 
results in a desired level of expression of the reporter gene; 

wherein the desired level of expression of the reporter gene confix a growth 
advantage on the host cell; and 

ii isolating host cells with a growth advantage wherein said cells comprise a 
first fiision protein and a second fusion protein which interact thereby selecting an 
interacting pair of test polypeptides. 
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5 2. The method of claim 1 , which jfurther comprises the step of identifying nucleic acids 
which encode test polypeptides which cause the desired level of expression of the reporter 
gene. 

3. The method of claim 1 , wherein selective growth conditions are apphed to the host 
10 cells. 

4. The method of claim 1 , wherem the desired level of expression of the reporter gene 
is an increase in the level of expression of the reporter gene as compared to the basal 
expression level of the reporter gene. 

15 

5 . The method of claim 1 , wherein the transcriptional regulatory sequence includes at 
least two binding sites for a DNA-binding domain. 

6. The method of claim 1, wherein the reporter gene encodes a gene product that gives 
20 rise to a detectable signal selected from the group consisting of cell viability, relief of a cell 

nutritional requirement, cell growth and drug resistance. 

7. The method of claim 1 , wherem the degree of the growth advantage conferred by 
the desired level of expression of the reporter gene is controllable by varying the growth 

25 conditions of the host cell. 

8. The method of claim 7, wherein the reporter gene is the yeast His3 gene. 

9. The method of claim 7, wherein the reporter gene is the yeast His3 gene and the 
30 degree of the growth advantage is controllable by exposing the host cell to varying 

concentrations of 3-aminotriazole. 
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5 

1 0. The method of claim 7, wherein the reporter gene is a p-lactamase gene. 

1 1 . Ihe method of claim 1 0, wherein the p-lactamase gene is selected from the group 
consisting of TEM-1, TEM-2, OXA-1, OXA-2, OXA-3, SHV-1, PSE-1, PSE-2, PSE-3, 

10 PSE-4 and CTX-1, and functional fragments thereof, 

12. The method of claim 1 1, wherein the pJactamase gene is TEM-l. 

13. The method of claim 10, wherein the reporter gene is a p -lactamase gene and the 
15 degree of the growth advantage is controllable by exposing the host cell to a p-lactam 

antibiotic. 

14. The method of claim 7, wherein the reporter gene is a p-lactamase gene and the 
degree of the growth advantage is controllable by exposing the host cell to a P-lactam 

20 antibiotic and varying concentrations of a P-lactamase inhibitor. 

1 5. The method of claim 13, wherein the p-lactam antibiotic is selected from the group 
consisting of penicillins, cephalosporins, monbactams and carbapenems. 

25 16. The method of claim 14, wherein the P-lactam antibiotic is selected from the group 
consisting of penicillins, cephalosporins, monbactams and caib^raems and the P- 
lactamase inhibitor is selected Scorn the group consisting of Clavulanic acid, sulbactam, 
tazobactam, brobactam and p-lactamase Inhibitory protein (BLIP). 
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5 17. The method of claim 1, wherein the activation tag is an RNA polymerase, an RNA 
polymerase subimit, a functional jfragment of an RNA polymerase, a functional fragment of 
an RNA polymerase subunit, a molecule covalently fused to RNA polymerase, a molecule 
covalently fused to an RNA polymerase subunit, a molecule covalently fused to a 
functional fragment of RNA polymerase, or a molecule covalently ftised to a functional 
10 fragment of an RNA polymerase subunit. 

18. The method of claim 1, wherein the activation tag is a polypeptide, a nucleic acid, or 
a small molecule, and wherein the activation tag binds RNA polymerase, an RNA 
polymerase subunit, a functional fragment of an RNA polymerase, or a functional fragment 

15 of an RNA polymerase subunit. 

19. The method of claim 1 , wherein the activation tag interacts indirectly with RNA 
polymerase via at least one intermediary polypeptide, nucleic acid, or small molecule, 
which fruictionally links the activation tag to the RNA polymerase. 

20 

20. The method of claim 18, wherein the activation tag is a fragment of Gal 1 IP, and 
wherein the activation tag interacts with a fusion between Gal4 and the a subunit of RNA 
polymerase. 

25 21. The method of claim 1 , wherein the prokaiyotic host cell further contains a second 
rq)orter gene. 



30 



22. The method of claim 21, wherein interaction of first fiasion protein and a second 
fusion protein in a host cell results in a desired level of expression of the second reporter 
gene. 
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5 23. The method of claim 22, wherein the desired level of expression of the second 
reporter gene is an increase in the level of expression of the reporter gene as compared to 
the basal expression level of the reporter gene. 

24. The method of claim 22, wherein host cells are isolated that have one reporter gene 
10 whose expression level is increased to a greater extent than the increase in the expression 

level of the other reporter gene, as compared to the basal level of expression of the reporter 
genes. 

25. The method of claim 21, wherein the first and second reporter genes are operably 
15 linked to the same transcriptional regulatory sequence. 

26. The method of claim 21, wherein the first and second reporter genes are operably 
linked to separate copies of the same transcriptional regulatory sequence. 

20 27. The method of claim 2 1 , wherein the first and second reporter genes are operably 
linked to different transcriptional regulatory sequences. 

28. The method of claim 21, wherein the second reporter gene encodes a gene product 
that gives rise to a detectable signal selected from the group consisting of color, 

25 fluorescence, luminescence, a cell surface tag, cell viability, relief of a cell nutritional 
requirement, cell growth and drug resistance. 

29. The method of claim 28, wherein the second reporter gene confers a growth 
advantage under selective conditions diflFerent fiiom the conditions used for the first reporter 

30 gene. 
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5 30. The method of claim 29, wherein the host cells containing a first and second fusion 
protein that interact are isolated by: 

i selecting a first population of host cells with a desired expression level of the 
first reporter gene followed by selecting a second population of host cells from the first 
population of host cells based on a desired expression level of the second reporter gene; 

10 ii selecting a first population of host cells with a desired expression level of the 

second reporter gene followed by selecting a second population of host cells from the first 
population of host cells based on a desired expression level of the first reporter gene; or 

in selecting a population of host cells based on simultaneous selection of 
desired expression levels of the first and second reporter genes. 

15 

3 1 . The method of claim 28, wherein the second reporter gene is the lacZ gene. 

32. The method of claim 28, wherein the second reporter gene encodes a fluorescent 
protein. 

20 

33. The method of claim 32, wherein the fluorescent protein is selected fxom the group 
consisting of green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), 
RenillaReniforaflis green fluorescent protein, GEPmut2, GFPuv4, enhanced yellow 
fluorescent protein (EYFP), enhanced cyan fluorescent protein (ECFP), enhanced blue 

25 fluorescent protein (EBFP), citrine and red fluorescent protein from discosoma (dsRED). 

34. The method of claim 28, wherein the second reporter gene encodes a protein which 
is expressed on the surface of the host cell. 

30 35. The method of claim 34, further comprising the steps of: 
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5 iii contacting the host cell with a fluorescently labeled antdbody specific for the 

protein encoded by the second reporter gene thereby labeling the host cell; and 

iv isolating the cells expressing the second reporter gene using FACS analysis; 

wherein steps iii and iv may occur before, after, or concurrently with step ii. 

10 36. The method of claim 34, further comprising: 

iii isolating host cells expressing the protein encoded by the second reporter 
gene using affinity chromatography, 

wherein isolation of the host cells based on expression of the second reporter gene 
may occur before or after isolation of the host cells based on a desired level of expression 
15 of the first reporter gene. 

37. The method of claim 36, wherein the afiBnity chromatography is carried out using a 
solid support or magnetic particles. 

20 38. Themethodofclaim21, wherein the first reporter gene is selected fi-om the group 
consisting of the yeast His3 gene and a P-lactamase gene and the second reporter gene is 
selected firom the group consisting of the lacZ gene, a fluorescent protein, a protein which is 
expressed on the surface of the host cell and the bacterial aadA gene. 

25 39. The method of claim 1, wherein the first and second fiision proteins are expressed 
from the same nucleic acid construct. 

*) 

40. The method of claim 1, wherein the fibrst and second fusion proteins are expressed 
from separate nucleic acid constructs. 



30 
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5 41 . The method of claim 1 , wherein the expression level of the first, second, or first and 
second fiision proteins can be controlled by varying the growth conditions of the host cell 

42. The method of claim 41, wherein the expression level of the first and second fiision 
proteins can be controlled by varying the concentration of ffTG, anhydrotetracycline, or 

10 DPTG and anhydrotetracycline to which the host cell is exposed. 

43. The method of claim 42, wherein the first, second, or first and second fiision 
proteins are expressed Scorn a promoter comprising a binding site for the lac repressor or the 
tet repressor, 

15 

44. The method of claim 41 , wherein the expression level of the first and second fiision 
protein can be independently controlled. 

45. The method of claim 1, wherein the first fiision protein is part of a Hbrary of at least 
20 10 members, the second fiision protein is part of a library of at least 10 members, or the 

first and second fiision proteins are both members of a library such that at least 10^ unique 
pairs of test polypeptides could be tested for interaction. 

46. The method of claim 1 , wherein the host cell is selected fi-om the group consisting 
25 of bacterial strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, 

Serratia, Streptococcus, Lactobacillus, Enterococcus and shigella. 

47. The method of claim 1, wherein (a), (b), or (c), or any combination thereof, are 
contained within one or more vectors for introduction into the host cell. 

30 
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5 48. The method of claim 47, wherein the vector is a phagemid and is introduced into the 
host cell by infection of the host cell with infectious phage containing the phagemid vector. 

49. A method for identi:fying agents which modulate a protein-protein interaction, 
comprising: 

1 0 i providing a population of prokaryotic host cells wherein each host cell 

contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 

sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 

15 (b) a first chimeric gene which encodes a first fusion protein, the first 

fusion protein including a DNA-binding domain and a first test 
polypeptide, 

(c) a second chimeric gene which encodes a second fusion protein, the 
second fusion protein including an activation tag and second test 
20 polypeptide, 

wherein the prokaryotic host cell is an imp' or gram positive strain of bacteria; 

wherein interaction of a first fusion protein and a second fusion protein in a host cell 
residts in a desired level of expression of the reporter gene; 

ii contacting the host cell with at least one test agent; and 

25 iii identifying test agents which modulate expression of the reporter gene in a 

manner also dependent on the expression of the first and second test polypeptides, 
thereby identij^g agents which modulate a protein-protein uiteraction. 



30 



50. The method of claim 49, wherein the reporter gene encodes a gene product that 
gives rise to a detectable signal selected firom the group consisting of color, fluorescence. 
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5 luminescence, a cell surface tag, cell viability, reUef of a cell nutritional requirement, cell 
growth and drug resistance. 

5 1 . The method of claim 49, which fizrther comprises comparing the level of expression 
of the reporter gene to a level of expression in a control experiment wherein one or both of 

10 the test polypeptides are absent or altered so as to preclude interaction of the first and 
second fusion proteins. 

52. The method of claim 49, wherein the test agent is selected from the group consisting 
of peptides, nucleic acids, carbohydrates, natural product extract libraries, and small 

15 organic molecules. 

53. The method of claim 49, wherein the test agent is part of a library of test agents. 

54. The method of claim 53, wherein the library of test agents has at least 10^ members. 

20 

55. The method of claim 49, wherein test agents are identified which agonize the 
protein-protein interaction based on a change in the expression level of the reporter gene in 
the presence of the test agent. 

25 56. The method of claim 49, wherein test agents are identified which antagonize the 
protein-protein interaction based on a change in the expression level of the reporter gene in 
the presence of the test agent. 

57 . The method of claim 49, wherein the host cells are grown imder conditions which 
30 increase the permeability of the cell membrane. 
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5 

58. A method for selecting a polypeptide which differentially interacts with at least two 
different test polypeptides, comprising: 

i providing a population of prokaryotic host cells wherein each cell contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 

1 0 . sequence which includes one or more binding sites (DBD recognition 

elements) for a iSrst DNA-binding domain, 

(b) a second reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a second DNA-binding domain, 

15 (c) a first chimeric gene which encodes a first fiision protein, the first 

fusion protein including a first DNA-binding domain and a first test 
polypeptide, 

(d) a second chimeric gene which encodes a second fusion protein, the 
second fusion protein uicluding a second DNA-binding domain and a 

20 second test polypeptide, 

(e) a third chimeric gene which encodes a third fusion protein, the third 
fusion protein including an activation tag and third test polypeptide, 

wherein the third fusion protein is part of a library of at least 10^ members; 

wherein mteraction of the first fusion protein and the third fusion protein in the host 
25 cell results in a desired level of expression of the first reporter gene; 

wherein interaction of the second fusion protein and the third fusion protein in the 
host cell results in a desired level of expression of the second reporter gene; and 

ii isolating host cells comprising a third fusion proteui capable of interacting 
with the first fusion protein, the second fusion protein, or the first and the second 

30 fusion proteins based on a desired level of expression of the first reporter gene, the 

second reporter gene, or the first and second reporter genes, respectively, thereby 
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5 selecting a polypeptide which differentially interacts with at least two different test 

polypeptides. 

59. The method of claim 58, wherein host cells are isolated which comprise a third 
fusion protein that interacts with both the first and second fusion proteins. 

10 

60. The method of claim 58, wherein host cells are isolated which comprise a third 
fusion protein that interacts to a greater extent with one of the peptides as compared to the 
other polypeptide. 

15 61 . The method of claim 58, wherein the host cell further comprises 

(e) a third reporter gene operably linked to a transcriptional regulatory sequence 
which includes a binding site (DBD recognition element) for a third DNA-binding domain, 

(f) a fourth chimeric gene which encodes a fourth fusion protein including a 
third DNA-binding domain and a fourth test polypeptide, 

20 wherein interaction of the fourth fusion protehi and the third fusion protein in the 

host cell results in a desired level of expression of the third reporter gene; and 

wherein step ii further conq)rises isolating host cells comprising the third fiision 
protein interacting with a fourth fusion protein based on a desired level of expression of the 
first, second and third reporter genes. 

25 

62. The method of claim 61 , wherein the host cell fiirther comprises 

(g) a fourth reporter gene operably linked to a transcriptional regulatory 
sequence which includes a binding site (DBD recognition element) for a fourth DNA- 
binding domain, 

30 (h) a fifth chimeric gene which encodes a fifth fusion protein including a fourth 

DNA-binding domain and a fifth test polypeptide. 
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5 wherein interaction of the fifth fusion protein and the third fusion protein in the host 

cell results in a desired level of expression of the fourth reporter gene; and 

wherein step ii further comprises isolating host cells comprising the third fiision 
protein interacting with a fifth fusion protein based on a desired level of expression of the 
first, second, third and fourth reporter genes. 

10 

63. The method of claim 62, wherein the host cell further comprises 

(i) a fifth reporter gene operably linked to a transcriptional regulatory sequence 
which includes a binding site (DBD recognition element) for a fifth DNA-binding domain, 

(j) a sixth chimeric gene which encodes a sixth fusion protein including a fifth 
15 DNA-binding domain and a fifth test polypeptide, 

wherein interaction of the sixth fusion protein and the third fusion protein in the host 
cell results in a desired level of expression of the fifth reporter gene; and 

wherein step ii further comprises isolating host cells comprising the third fusion 
protein interacting with a sixth fusion protein based on a desired level of expression of the 
20 first, second, third, fourth and fifth reporter genes. 

64. The method of any one of claims 58 or 61-63, wherein host cells are isolated 

(i) which comprise a third fusion protein that interacts to a desired extent with 
all of the other fusion proteins; 

25 (ii) which comprise a third fusion protein that interacts with one of the 

polypeptides to a greater extent than it interacts with the other fusion proteins; or 

(iii) which comprise a third fusion protein that interacts to a desired extent with a 
desired combination of at least two of the other fusion proteins. 
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5 65. The method of any one of claims 58 or 61-63, wherein the desired level of 

expression of at least one of the reporter genes is an increase in the level of expression of 
the reporter gene as connqpared to the basal expression level of the reporter gene. 

66. The method of any one of claims 58 or 61-63, wherein the reporter genes encode 
10 unique detectable proteins which can be analyzed independently, simultaneously, or 

independently and simultaneously. 

67. The method of claim 66, wherehi at least one of the reporter genes encodes a 
fluorescent protein. 

15 

68. The method of claim 66, wherein the expression level of at least one of the reporter 
genes is analyzed by FACS. 

69. The method of any one of claims 58 or 61-63, which further comprises the step of 
20 identifying nucleic acids which encode fusion proteins resulting in a desired level of 

expression of a reporter gene. 

70. A method for selecting a test agent that differentially modulates the interaction of a 
polypeptide with at least two different test polypeptides, comprising: 

25 i providing a population of prokaryotic host cells wherein each cell contains 



(a) 



a first reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a first DNA-binding domain, 



30 



(b) 



a second reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a second DNA-binding domain. 
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5 



(C) 



a first chimeric gene which encodes a first fiision protein, the first 
fusion protein including a first DNA-binding domain and a first test 
polypeptide. 



(d) 



a second chimeric gene which encodes a second fiision protein, the 
second fiision protein including a second DNA-binding domain and a 
second test polypeptide, 



10 



a third chimeric gene which encodes a third fiision protein, the third 
fiision protein including an activation tag and third test polypeptide. 



wherein the host cell is an imp" or gram positive strain of bacteria; 



wherein interaction of the first fiision protein and the third fiision protein in the host 



15 cell results in a desired level of expression of the first reporter gene; 

wherein interaction of the second fiision protein and the third fiision protein in the 
host cell results in a desired level of expression of the second reporter gene;. 

ii • contacting ttie host cell with at least one test agent; and 

iii identifying test agents which modulate the expression of the first, second, or 
20 first and second reporter genes in a manner also dependent on the expression of the first, 

second and third test polypeptides, thereby selecting a test agent that differentially 
modulates the interaction of a polypeptide with at least two different test polypeptides. 

71. A method for detecting an interaction between a test polypeptide and a DNA 
25 sequence, comprising 



i 



providing a population of prokaryotic host cells wherein each cell contains 



(a) 



a reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 



30 



(b) 



a chimeric gene which encodes a fusion protein, the fiision protein 
including a test polypeptide and an activation tag. 
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5 wherein the DBD recognition element is part of a library of at least 10^ members, 

the fusion protein is part of a library of at least 10^ members, or the DBD recognition 
element and the fusion protein are both members of a library such that at least 10^ unique 
pairs of a DBD recognition element and a fusion protein could be tested for interaction; 

wherein interaction between a test polypeptide of a fusion protein and a DBD 
10 recognition element in a host cell results in a desired level of expression of the reporter 
gene; 

wherein the desired level of expression of the reporter gene confers a growth 
advantage on the host cell; and 

ii isolating host cells with a growth advantage wherein said cells comprise a 
15 fusion protein and a DBD recognition element which interact, thereby detecting an 

interaction between a test polypq)tide and a DNA sequence. 

72. The method of claim 71 , further comprising the step of identifying the nucleic acid 
which encodes a test polypeptide that interacts with the DBD recognition element DNA 

20 sequence. 

73. The method of claim 71 , wherein the activation tag is an RNA polymerase, an RNA 
polymerase subunit, a functional fiagment of an RNA polymerase, or a fimctional fragment 
of an RNA polymerase subunit 

25 

74. The method of claim 71, wherein the activation tag is an RNA polymerase, an RNA 
polymerase subunit, a functional fiagment of an RNA polymerase, a functional fragment of 
an RNA polymerase subunit, a molecule covalently fused to KNA polymerase, a molecule 
covalently fused to an RNA polymerase subunit, a molecule covalently fused to a 

30 functional fragment of RNA polymerase, or a molecule covalently fused to a frmctional 
fragment of an RNA polymerase subunit. 
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5 75. The method of claim 71, wherein the activation tag interacts indirectly with RNA 
polymerase via at least one intermediary polypq)tide, nucleic acid, or small molecule, 
which functionally links the activation tag to the RNA polymerase. 

76. The method of claim 74, wherein the activation tag is a fragment of Gal 1 IP, and 
10 wherein the activation tag interacts with a fusion between Gal4 and the a subunit of RNA 

polymerase. 

77. The method of claim 7 1 , wherein (a), (b), or (a) and (b), are contained within one or 
more vectors for introduction into the host celL 

15 

78. The method of claim 71 , wherein the growth advantage is selected from the gjcoup 
consisting of cell viability, relief of a cell nutritional requirement, cell growth and drug 
resistance. 

20 79. The niethod of claim 71, wherein the degree of the growth advantage conferred by 
the desired level of expression of the reporter gene is controllable by varying the growth 
conditions of the host cell. 

80. The method of claim 79, wherein the reporter gene is the yeast His3 gene. 

25 

81. The method of claim 79, wherein the reporter gene is the yeast His3 gene and the 
degree of the growth advantage is controllable by exposing the host cell to varying 
concentrations of S-aminotriazole. 

30 82. The method of claim 79, wherein the rq)orter gene is a ^-lactamase gene. 
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5 

83. The method of claun 82, wherein the |3-lactamase gene is selected &om the group 
consisting of TEM-1, TEM-2, OXA-1, OXA-2, OXA-3, SHV-1, PSE-1, PSE-2, PSE-3, 
PSE-4 and CTX-1, and functional fragments thereof. 

1 0 84. The method of claim 83, wherein the p-lactamase gene is TEM-1 . 

85. The method of claim 82, wherein the reporter gene is a P-lactamase gene and the 
degree of the growth advantage is controllable by exposing the host cell to a P-lactam 
antibiotic. 

15 

86. The method of claim 79, wherein the reporter gene is a P-lactamase gene and the 
degree of the growth advantage is controllable by exposing the host cell to a P-lactam 
antibiotic and varying concaitrations of a p-lactamase inhibitor. 

20 87. The method of claim 85, wherein the P-lactam antibiotic is selected from the group 
consisting of penicillins, cephalosporins, monbactams and carbapenems. 

88. The method of claim 86, wherein the p-lactam antibiotic is selected from the group 
consisting of penicillins, cephalosporins, monbactams and carbapenems and the p- 
25 lactamase inhibitor is selected from the group consisting of Clavulanic acid, sulbactam, 
tazobactam, brobactam and p-lactamase inhibitory protein (BLIP). 



30 



89. The method of claim 71, wherein the DBD recognition element is a member of a 
library of at least 10^ potential binding sites for a DNA binding domain, wherein host cells 
comprising a DBD recognition element bound by a test polypeptide are isolated. 
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5 

90. The method of claim 89, wherein the polypeptide is a zinc finger protein. 

91. The method of claim 71, wherehi the DBD recognition element is a desired binding 
site for a DNA binding domain and the test polypeptide is a member of a library of at least 

10 10^ polypeptides, wherein host cells comprising a polypeptide which binds to the DBD 
recognition element are isolated. 

92. The method of claim 71, wherein the DBD recognition element is a desired binding 
site for a DNA binding domain and the test polypeptide is a member of a library of at least 

15 10^ polypeptides, wherein host cells comprising a polypeptide which binds to the DBD 
recognition element are isolated. 

93. The method of claim 91, wherein the polypeptides are zinc finger proteins. 

20 94. The method of claim 7 1 , whereia the DBD recognition element is a member of 
Ubrary of potential binding sites for a DNA binding domain and the test polypeptide is a 
member of a library of polypq)tides, wherein host cells comprising a polypeptide that binds 
a DBD recognition element are isolated. 

25 95. The method of claim 94, wherein the polypeptides are zinc finger proteins. 

96. A polypeptide isolated by the method of any one of claims 91 , 92 or 94. 

97. The polypeptide of claim 96 which is a zinc finger protein. 

30 
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5 98 . A binding site for a DNA binding domain isolated by the method of any one of 
claims 89 or 94. 



99. The binding site for a DNA binding domain of claim 98 which binds a zinc finger 
protein. 

100. An hiteracting pair of a polypeptide and a binding site for a DNA binding domain 
isolated by the method of clafan 94. 



101. The interacting pair of claim 1 00, wherein the polypeptide is a zinc finger protein. 



15 



102. A method for identifying agents which modulate an interaction between a test 
polypeptide and a DNA sequence, comprising 

i providing a population of prokaryotic host cells wherein each cell contains 

(a) a reporter gene operably linked to a transcriptional regulatory 

20 sequence which includes one or more binding sites (DBD recognition 

elements) for a DNA-binding domain, 

(b) a chimeric gene which encodes a fusion protein, the fusion protein 
including a test polypeptide and an activation tag, 

wherein the prokaryotic host cell is an imp" or gram positive strain of bacteria; 

25 wherein interaction between a test polypeptide of a fusion protein and a DBD 

recognition element in a host cells results in a desired level of expression of the rqjorter 
gene; 

ii contacting the host cell with at least one test agent; and 

iii identifying agents which modulate expression of the reporter gene in a 

30 manner also dependent on the presence of a fusion protein and a DBD recognition element. 
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5 

103. The method of claim 102, wherein the reporter gene encodes a gene product that 
gives rise to a detectable signal selected from the group consisting of color, fluorescence, 
luminescence, a cell surface tag, cell viability, relief of a cell nutritional requirement, cell 
growth and drug resistance. 

10 

104. The method of claim 102, wherein the DBD recognition element is part of a library 
of at least 10^ members, the fusion protein is part of a library of at least 10'^ members, or the 
DBD recognition element and the fusion protein are both members of a library such that at 
least lO' unique pairs of a DBD recognition element and a fusion protein could be tested for 

15 interaction. 

105. The method of claim 102, which further comprises comparing the level of 
expression of the reporter gene to a level of expression in a control experiment wherein one 
or both of the test polypeptide and the DBD recognition el^ent are absent or altered so as 

20 to preclude interaction of the fusion protein and the DBD recognition element. 

106. The method of claim 102, wherein the test agent is selected from the group 
consisting of peptides, nucleic acids, carbohydrates, natural product extract libraries, and 
small organic molecules. 

25 

107. The method of claim 102, wherein the test agent is part of a library of test agents. 

108. The method ofclaim 102, wherein test agents are identified which agonize the 
protein-nucleic acid interaction based on a change in the expression level of the reporter 

30 gene in the presence of the test agent. 
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5 109. The method of claim 102, wherein test agents are identified which antagonize the 
protein-nucleic acid interaction based on a change in the expression level of the reporter 
gene in the presence of the test agent. 



110. The method of claim 1 02, wherein the host cells are grown under conditions which 
10 increase the permeability of the cell membrane. 



111. A method for selecting a polypeptide that differentially interacts with at least two 
different DNA sequences, comprising 

i providing a population of prokaryotic host cells each of which contains 

15 (a) a first reporter gene operably linked to a transcriptional regulatory 

sequence which includes one or more binding sites (DBD recognition 
elements) for a first DNA-binding domain, 

(b) a second reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding site (DBD recognition 

20 element) for a second DNA-binding domain, 

(c) a chimeric gene which encodes a fiision protein, the fiision protein 
including a test polypeptide and an activation tag, 

wherein the fiision protein is part of a library of at least 10^ members; 

wherein interaction of a fusion protein with the first DBD recognition element in the 
25 host cells results in a desired level of expression of the first reporter gene; 

wherein interaction of a fiision protein with the second DBD recognition element in 
the host cells results in a desired level of expression of the second reporter gene; and 

ii isolating host cells comprising a fiision protein that interacts with the first 
DBD recognition element, the second DBD recognition element, or the first and second 

30 DBD recognition elements based on a desired level of expression of the first reporter gene, 
the second reporter gene, or the first and second reporter genes, respectively, thereby 
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5 selecting a polyp^tide that differentially interacts with at least two different DNA 
sequences. 

1 12. The method of claim 111, wherein the fusion protein is assayed for the ability to 
interact with at least three different DNA sequences each operably Knked to different 

10 reporter genes. 

113. The method of claim 111, wherein the jBrst and second reporter genes are operably 
linked to the same transcriptional regulatory sequence. 

15 114. The method of claim 111, wherein the first and second reporter genes are operably 
linked to separate copies of the same transcriptional regulatory sequence. 

115. The method of claim 111, wherein the first and second reporter genes are operably 
hnked to different transcriptional regulatory sequences. 

20 

116. The method of any one of claims 111-115, wherein all of the reporter genes encode 
different proteins and each reporter gene may be detected independently, simultaneously, or 
independently and simultaneously. 

25 117- The method of anyone of claims 111-115, which further conq}rises the step of 
isolating the nucleic acid which encodes the fusion protein. 

118. The method of any one of claims 111-115, wherein host cells are isolated that have 
one reporter gene whose expression level is increased to a greater extent than the increase 
30 in the expression level of the other reporter genes, as compared to the basal level of 
expression of the reporter genes. 
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1 19. A mefbod for selecting a test agent that differentially modulates the interaction of a 
pol>peptide with at least two different DNA sequences, comprising 

i providing a population of prokaryotic host cells each of which contains 

(a) a first reporter gene operably linked to a transcriptional regulatory 

10 sequence which includes one or more binding sites (DBD recognition 

elements) for a first DNA-binding domain, 

(b) a second reporter gene operably Unked to a transcriptional regulatory 
sequence which includes one or more binding site (DBD recognition 
element) for a second DNA-binding domain, 

15 (c) a chimeric gene which encodes a fiision protein, the fusion protein 

including a test polypeptide and an activation tag, 

wherein the prokaryotic host cell is an imp' or gram positive strain of bacteria; 

wherein interaction of a fusion protein with the first DBD recognition element in the 
host cells results in a desired level of expression of the first reporter gene; 

20 wherein interaction of a fiision protein vnih the second DBD recognition element in 

the host cells results in a desired level of expression of the second reporter gene; 

ii contacting the host cell with at least one test agent; and 

iii identifying test agents which modulate the expression of the first, second, or 
first and second reporter genes in a manner also dependent on the presence of the fusion 

25 protein and the first and second DBD recognition elements, thereby selecting a test agent 
that differentially modulates the interaction of a polypeptide with at least two different 
DNA sequences. 



120. The method of claim 1 19, wherein the DBD recognition elanent is part of a library 
30 of at least 10^ members, the fusion protein is part of a library of at least 10^ members, or the 
DBD recognition element and the fusion protein are both members of a library such that at 
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least lO'^ unique pairs of a DBD recognition element and a fusion protein could be tested for 
interaction. 

121 . A method for detecting an interaction between a test RNA binding domain 
polypeptide and an RNA sequence, comprising 

i providing a population of prokaryotic host cells wherein each cell contains 

(a) a reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 

(b) a first chimeric gene which encodes a fusion protein, the fusion 
protein including a DNA-binding domain and a first RNA binding 
domain, 

(c) a second chimeric gene which encodes a fusion protein, the fusion 
protdn including an activation tag and a second RNA binding 
domain, 

(d) a third chimeric gene which encodes a hybrid RNA, the hybriARNA 
comprising a first RNA sequence that binds one of the first or second 
RNA binding domains and a second RNA sequence to be tested for 
interaction with the RNA-binding domain not bound to the first RNA 
sequence; 

wherein the RNA-binding domain not bound to the first RNA sequence is part of a 
library of at least 10^ members, the second RNA sequence is part of a library of at least 10'' 
members, or the RNA-bindmg domain not bound to the first RNA sequence and the second 
RNA sequence are both members of a library such that at least 10^ unique pairs of an RNA- 
binding domain and an RNA sequence could be tested for interaction; 

wherein interaction of an RNA-binding domain not bound to the first RNA 
sequence with the second RNA sequence in a host cell results in a desired level of 
expression of the reporter gene; and 
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ii isolating host cells comprising an RNA-binding domain that interacts with 
the second RNA sequence based on a desired level of expression of the reporter gene 
thereby detecting an interaction between a test RNA binding domain polypeptide and an 
RNA sequence. 

122. A kit for selecting a polypeptide that interacts with a test polypeptide, comprising: 

i a first gene construct for encoding a first fusion protein, which first gene 
construct comprises: 

(a) transcriptional and translational elements which direct expression of 
a protein in a prokaryotic host cell, 

(b) a DNA sequence that encodes a DNA binding domain and which is 
operably linked with the transcriptional and translational elements of 
the first gene construct, and 

(c) one or more sites for inserting a DNA sequence encoding a first test 
polypeptide into the first gene construct in such a manner that the 
first test polypeptide is expressed in-fi-ame as part of a fiision protein 
containing the DNA binding domain; 

ii a second gene construct for encoding a second fusion protein, which second 
gene construct comprises: 

(a) transcriptional and translational el^ents which direct expression of 
a protein in a prokaryotic host cell, 

(b) a DNA sequence that encodes an activation tag and which is 
operably linked with the transcriptional and translational elements of 
the second gene construct, and 

(c) one or more sites for inserting a DNA sequence encoding a second 
test polypeptide into the second gene construct in such a manner that 
the second test polypeptide is expressed in-frame as part of a fusion 
protein containing the activation tag; and 



wo 01/88197 



PCTAJSOl/15718 



-158- 

5 iii a prokaryotic host cell containing at least one reporter gene having one or 

more binding sites (DBD recognition elements) for the DNA binding domain, and 

wherein a desired level of expression of the reporter gene is obtained upon 
intCTaction of the first and second fusion proteins; and 

wherein the desired level of expression of the reporter gene confers a growth 
10 advantage on the host cell. 

123. The kit of claim 122, wherein the reporter gene encodes a gene product that gives 
rise to a detectable signal selected from the group consisting of cell viability, relief of a cell 

15 nutritional requirement, cell growth and drug resistance. 

124. The kit of claim 122, wherein the degree of the growth advantage conferred by the 
desired level of expression of the reporter gene is controllable by varying the growth 
conditions of the host cell. 

20 

125. The kit of claim 124, wherein the reporter gene is the yeast His3 gene. 

126. The kit of claim 124, wherein the reporter gene is the yeast His3 gene and the 
degree of tiie growth advantage is controllable by exposing the host cell to varying 

25 concentrations of 3-aminotriazole. 

127. The kit of claim 124, wherein the reporter gene is a P-lactamase gene. 
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5 128. The kit of claim 127, wherein the P-lactamase gene is selected from the group 
consisting of TEM-I, TEM-2, OXA-1, OXA-2, OXA-3, SHV-1, PSE-1, PSE-2, PSE-3, 
PSE-4 and CTX-1, and functional fragments thereof. 

129. The kit of claim 128, wherein the P-lactamase gene is TEM-1 . 

10 

130. The kit of claim 127, wherein the reporter gene is a P-lactamase gene and the degree 
of the growth advantage is controllable by exposing the host cell to a p-lactam antibiotic. 

131. The kit of claim 124, wherein the reporter gene is a P-lactamase gene and the degree 
15 of the growth advantage is controllable by exposing the host cell to a p-lactam antibiotic 

and varying concentrations of a p-lactamase inhibitor. 

132. The kit of claim 130, wherein the P-lactam antibiotic is selected from the group 
consisting of penicillins, cephalosporins, monbactams and carbapenems. 

20 

133. The kit of claim 131, wherein the p-lactam antibiotic is selected from the group 
consisting of penicillins, cephalosporins, monbactams and carbapenems and the P- 
lactamase inhibitor is selected from the group consisting of Qavulanic acid, sulbactam, 
tazobactam, brobactam and P-lactamase inhibitory protein (BLIP). 

25 

1 34. The kit of claim 122, wherein the first gene construct, the second gene construct, or 
the first and second gene constructs, are contained within a phagemid vector. 

135. A test polypeptide isolated using the kit of claim 122. 

30 
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5 136. A kit for detecting an interaction between a test DNA-binding domain polypeptide 
and a DNA sequence, comprising: 

i a first gene construct which comprises: 

(a) one or more sites for inserting a DNA sequence comprising a 
transcriptional element which includes at least one binding site (DBD 

10 recognition element) for a DNA-binding domain, 

(b) a translational element operably linked to the transcriptional element, 
and 

(c) a DNA sequence for a reporter gene which is operably linked with 
the transcriptional and translational elements of the jSrst gene 

15 construct, and 

wherein the transcriptional and translational elements direct expression of 
the reporter gene in a prokaryotic host cell; 

ii a second gene construct for encoding a first fusion protein, which second 
gene construct comprises: 

20 (a) transcriptional and translational elmients which direct expression of 

a protein in a prokaryotic host cell, 

(b) a DNA sequence that encodes an activation tag and which is 

operably linked with the transcriptional and translational elements of 
the second gene construct, and 

25 (c) one or more sites for inserting a DNA sequence encoding a first test 

polypeptide into the second gene construct in such a maimer that the 
first test polypeptide is expressed in-firame as part of a fiision protein 
containing the activation tag; 

iii a prokaryotic host cell, and 

30 wherein a desired level of expression of the reporter gene is obtained upon 

interaction of a test polypeptide with a DBD recognition element; and 
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5 wherein the desired level of expression of the reporter gene confers a growth 

advantage on the host cell. 

137. The kit of claim 136, wherein the reporter gene encodes a gene product that gives 
rise to a detectable signal selected from the group consistmg of cell viability, relief of a cell 

10 nutritional requirement, cell growth and drug resistance. 

138. The kit of claim 136, wherein the degree of the growth advantage conferred by the 
desired level of expression of the reporter gene is controllable by varying the growth 
conditions of the host cell. 

15 

139. The kit of claim 138, wherein the reporter gene is the yeast His3 gene. 

140. The kit of claim 138, wherein the reporter gene is the yeast His3 gene and the 
degree of the growth advantage is controllable by exposing the host cell to varying 

20 concentrations of 3-aminotriazole. 

141 . The kit of claim 1 38, wherein the reporter gene is a p-lactamase gene. 

142. The kit of claim 141, wherein the P-lactamase gene is selected fix)m the group 
25 consisting of TEM-1, TEM-2, OXA-1, OXA-2, OXA-3, SHV-1, PSE-1, PSE-2, PSE-3, 

PSE-4 and CTX-1, and functional fragments thereof. 

143. The kit of claim 142, wherein the p-lactamase gene is TEM-L 
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5 144. The kit of claim 141, wherein the reporter gene is a P-lactamase gene and the degree 
of the growth advantage is controllable by exposing the host cell to a p-lactam antibiotic. 



145. The kit of claim 138, wherein the reporter gene is a P-lactamase gene and the degree 
of the growth advantage is controllable by exposing the host cell to a p-lactam antibiotic 

1 0 and varying concentrations of a p-lactamase inhibitor. 

146. The kit of claim 144, wherein the P-lactam antibiotic is selected from the group 
consisting of penicillins, cephalosporins, monbactams and carbapenems. 



15 147. The kit of claim 145, wherein the P-lactam antibiotic is selected from the group 
consisting of penicillins, cephalosporins, monbactams and carbapenems and the P- 
lactamase inhibitor is selected from the group consisting of Clavulanic acid, sulbactam, 
tazobactam, brobactam and p-lactamase inhibitory protein OBLIP). 

20 148. The kit of claim 136, wherein the first gene construct, the second gene construct, or 
the first and second gene constructs, are contained within a phagemid vector. 



149. The kit of claim 136, wherein the activation tag is an KNA polymerase, an RNA 
polymerase subunit^ a functional fragment of an RNA polymerase, or a functional fragment 
25 of an RNA polymerase subunit. 



150. The kit of claim 136, wherein the activation tag is an RNA polymerase, an RNA 
polymerase subunit, a functional fragment of an RNA polymerase, a functional fragment of 
an RNA polymerase subunit, a molecule covalently fused to RNA polymerase, a molecule 
30 covalently fused to an RNA polymerase subunit, a molecule covalently fused to a 
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5 functional fragment of RNA polymerase, or a molecule covalently fused to a functional 
fragment of an RNA polymerase subunit. 

151. The kit of claim 1 36, wherein the activation tag interacts indirectly with RNA 
polymerase via at least one intermediary polypeptide, nucleic acid, or small molecule, 

10 which functionally links the activation tag to the RNA polymerase. 

152. The kit of claim 136, wherein the activation tag is a fragment of Gal 1 IP, and 
wherein the activation tag interacts with a fusion between Gal4 and the a subunit of RNA 
polymerase. 

15 

153. The kit of claim 136, wherein the desired level of expression of the reporter gene 
confers a growth advantage to the host cells. 

1 54. A test DNA-binding domain polypeptide isolated using the kit of claim 136. 

20 

155. The DNA-binding domain polypeptide of claim 154, which is a zinc finger protein. 

156. A binding site for a DNA binding domain isolated using the kit of claim 136. 

25 157. The binding site for a DNA binding domain of claim 156 that binds a zinc finger 
protein. 

158. An interacting pair of a polypeptide and a binding site for a DNA binding domain 
isolated using the kit of claim 136. 

30 



wo 01/88197 



PCT/USOl/15718 



-164- 

1 59. The interacting pair of claim 158, wherein the polypeptide is a zinc finger protein. 

160. A method for detecting an interaction between a first test polypeptide and a second 
test polypeptide, comprising: 

i providing a population of host cells wherein each cell contains 

(a) a reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 

(b) a first chimeric gene which encodes a first fusion protein, the first 
fiision protein including a DNA-binding domain and a first test 
polypeptide, 

(c) a second chimeric gene which encodes a second fusion protein, the 
second fiision protein including an activation tag and second test 
polypeptide, 

wherein expression of the reporter gene results in signal detectable by FACS; 

wherein interaction of the first fiision protein and second fiision protein in the host 
cell results in a desired level of expression of the reporter gene; and 

ii isolating host cells comprising an interacting pair of fiision proteins based on 
a desired level of expression of the reporter gene using FACS thereby detecting an 
interaction between a first test polypeptide and a second test polypeptide. 

161. The method of claim 160, which fiirther comprises the step of isolating the nucleic 
add which encodes the test polypeptides. 

162. The method of claim 160, wherein the first, second, or first and second fiision 
proteins are members of a library. 
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5 163. The method of clahn 162, wherein the first fusion protein is part of a library of at 
least 10^ members, the second fusion protein is part of a library of at least 10^ members, or 
the first and second fusion proteins are both members of a library such that at least 10^ 
unique pairs of test polypeptides could be tested for interaction. 

10 164. The method of claim 160, wherein the host cell is a eukaryotic cell. 

1 65 . The method of claim 1 64, wherein the host cell is a yeast cell. 

166. The method of claim 160, wherein the host cell is a prokaiyotic cell. 

15 

167. The method of claim 166, wherein the host cell is selected from the group consisting 
of bacterial strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, Sahnonella, 
Serratia, Streptococcus, Lactobacillus, Enterococcus and shigella. 

20 168. The method of claim 1 60, wherein the desired level of expression of the reporter 
gene is an increase in the level of expression of the reporter gene as compared to the basal 
expression level of the reporter gene. 

169- The method of claim 160, wherein the transcriptional regulatory sequence includes 
25 at least two binding sites for a DNA-binding domain. 

170. The method of claim 160, wherein the transcriptional regulatory sequence includes 
at least three binding sites for a DNA-bmding domain. 
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5 171. The method of claim 1 60, wherein the reporter gene encodes a protein product 
which gives rise to a detectable signal selected from the group consisting of fluorescence, 
luminescence and a cell surface tag. 

172. The metliod of claim 171, wherein the reporter gene encodes a fluorescent protein 
10 selected from the group consisting of green fluorescent protein (GFP), enhanced green 

fluorescent protein (EGFP), Renilla Reniformis green fluorescent protein, GFPmut2, 
GFPuy4, enhanced yellow fluorescent protein (EYFP), enhanced cyan fluorescent protein 
(ECFP), enhanced blue fluorescent protein (EBFP), citrine and red fluorescent protein from 
discosoma (dsRED). 

15 

173. The method of claim 171, wherein the reporter gene encodes a cell surface tag and 
the method further comprises the step of contacting the host cell with a fluorescentiy 
labeled antibody specific for the cell surface tag, thereby labeling the host cell, before 
isolation of host cells by FACS. 

20 

174. The method of claim 160, wherein the host cell further contains a second reporter 
gene. 

175. The method of claim 174, wherein interaction of first fusion protein and a second 
25 fusion protein in a host cell results in a desired level of expression of the second reporter 

gene. 

176. The method of claim 175, wherein the desired level of expression of the second 
reporter gene is an increase in the level of expression of the reporter gene as compared to 

30 the basal expression level of the reporter gene. 
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5 177. The method of claim 175, wherein the desired level of expression of the first 

reporter gene is an increase in the expression level of the first reporter gene as compared to 
the basal expression level of the first reporter gene, and the desired level of expression of 
the second reporter gene is a smaller increase in the expression level of the second reporter 
gene as compared to the basal expression level of the second reporter gene relative to the 

10 increase in expression of the first reporter gene. 

178. The method of claim 174, wherein the first and second reporter genes are operably 
linked to the same transcriptional regulatory sequence. 

15 179. The method of claim 174, wherein the first and second reporter genes are operably 
linked to separate copies of the same transcriptional regulatory sequence. 

180. The method of claim 174, wherein the first and second reportar genes are operably 
linked to separate copies of different transcriptional regulatory sequences. 

20 

181. The method of claim 174, wherein the second reporter gene encodes a protein 
product which gives rise to a detectable signal selected firom the group consisting of 
fluorescence, luminescence and a cell surface tag. 

25 182. The method of claim 181, wherein the second reporter gene encodes a fluorescent 
protein selected &om the group consisting of green fluorescent protein (GFP), enhanced 
green fluorescent protein (EGFP), Renilla Reniformis green fluorescent protein, GFPmut2, 
GEPuv4, enhanced yellow fluorescent protein (EYFP), enhanced cyan fluorescent protein 
(ECFP), enhanced blue fluorescent protein (EBFP), citrine and red fluorescent protein firom 

30 discosoma (dsRED). 
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5 1 83 . The method of claim 181, wherem the secx)nd reporter gene encodes a cell surface 
tag and the method further comprises the step of contacting the host cell with a 
fluorescently labeled antibody specific for the cell surface tag, thereby labeling the host 
cell, before isolation of host cells by FACS. 

10 1 84. The method of claim 1 74, wherem the first and second reporter genes encode 
proteins which can be analyzed independently, simultaneously, or independently and 
simultaneously 

185. The method of claim 160, wherein the first and second fiision proteins are expressed 
15 fi-om the same nucleic acid construct. 

186. The method of claim 160, wherein the first and second fiision proteins are expressed 
from separate nucleic acid constructs. 

20 1 87. The method of claim 1 60, wherein the expression level of the- first, second, or first 
and second fiision proteins can be controlled by varying the growth conditions of the host 
cell. 

188. The method of claim 187, wherein the expression level of the first and second 
25 fiision proteins can be controlled by varying the concentration of IPTG, 

anhydrotetracycline, or IPTG and anhydrotetracycline to which the host cell is exposed. 

1 89. The method of claim 1 88, wherein the first, second, or first and second fiision 
proteins are expressed firom a promoter comprising a binding site for the lac repressor or the 

30 tet repressor. 
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5 190, The method of claim 1 87, wherein the expression level of the first and second 
fusion proteins can be mdependently controlled. 

191 . A method for selecting a polypeptide that differentially interacts with at least two 
different test polypeptides, comprising: 

10 i providing a population of host cells wherein each cell contains 

(a) a iBrst reporter gene operably linked to a transcriptional regulatory 
sequence which mcludes one or more binding sites fOBD recognition 
elements) for a first DNA-binding domain, 

(b) a second reporter gene operably linked to a transcriptional regulatory 
1 5 sequence which includes one or more binding sites (DBD recognition 

elements) for a second DNA-binding domain, 

(c) a first chimeric gene which encodes a first fusion protein, the first 
fusion protein mcluding a first DNA-binding domain and a first test 
polypeptide, 

20 (d) a second chimeric gene which encodes a second fusion protein, the 

second fusion protein including a second DNA-binding domain and a 
second test polypeptide, 

(e) a third chimeric gene which encodes a third fusion protein, the third 
fusion protein including an activation tag and third test polypeptide, 

25 wherein expression of the first and second reporter genes results m a signal 

detectable by FACS; 

wherein interaction of the first fusion protein and the third fiision protein in the host 
cell results in a desired level of expression of the first reporter gene; 

wherein interaction of the second fiision protein and the ^d fusion protein in the 
30 host cell results in a desired level of expression of the second reporter gene; and 
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5 



ii isolating host cells comprising a third fusion protein capable of interacting 
with the first fusion protein, the second fusion protein, or the first and the second 
fusion proteins based on a desired level of expression of the first reporter gene, the 
second reporter gene, or the first and second reporter genes, respectively, using 



10 



FACS, thereby selecting a polypeptide that differentially interacts with at least two 
different test polypeptides. 



192. The method of claim 191 , wherein host cells are isolated which comprise a third 
fusion protein that interacts with both the first and second fusion proteins. 

15 1 93 . The method of claim 191, wherein host cells are isolated which comprise a third 
fusion protein that interacts with only one of the first or second fusion proteins. 

194. The method of claim 191, wherein the host cell further comprises 

(e) a third reporter gene operably linked to a transcriptional regulatory sequence 
20 which includes a binding site (DBD recognition element) for a third DNA-binding domain, 

(f) a fourth chimeric gene which encodes a fourth fusion protein including a 
third DNA-binding domain and a fourth test polypeptide, 

wherein expression of the third reporter gene results in a signal detectable by FACS; 

wherein interaction of the fourth fusion protein and the third fusion protein in the 
25 host cell results in a desired level of expression of the third reporter gene; and 

wherein step ii further comprises isolating host cells comprising the third fusion 
protein interacting with a fourth fusion protem based on a desired level of expression of the 
first, second and third reporter gaies using FACS. 



30 



195. 



The method of claim 194, wherein the host cell further comprises 
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5 (g) a fourth reporter gene operably linked to a transcriptional regulatory 

sequence which includes a binding site (DBD recognition element) for a fourth DNA- . 
binding domain, 

(h) a fifth chimeric gene which encodes a fifth fiision protein including a fourth 
DNA-binding domain and a fifth test polypeptide, 

10 wherein expression of the fourth reporter gene results in a signal detectable by 

FACS; 

wherein interaction of the fifth ftision protein and the third fiision protein in the host 
cell results in a desired level of expression of the fourth reporter gene; and 

wherein step ii fiirther comprises isolating host cells comprising the third fiision 
15 protein interacting with a fifth fiision protein based on a desired level of expression of the 
first, second, third and fourth reporter genes using FACS. 



196. The method of claim 195, wherein the host cell fiirther comprises 

(i) a fifth reporter gene operably linked to a transcriptional regulatory sequence 
20 which includes a binding site (DBD recognition element) for a fifth DNA-binding domain, 

(j) a sixth chimeric gene which encodes a sixth fiision protein including a fifth 
DNA-binding domain and a fifth test polypeptide, 

wherein expression of the fifth reporter gene results in a signal detectable by FACS; 

wherein interaction of the sixth fiision protein and the third fusion protein in the host 
25 cell results in a desired level of expression of the fifth reporter gene; and 

wherem step ii fiirther comprises isolating host cells comprising the third fiision 
protein interacting with a sixth fiision protein based on a desired level of expression of the 
first, second, third; fourth and fifth reporter genes using FACS. 
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5 197. The method of any one of claims 191 or 194-196, wherein host cells are isolated 
which comprise a third fusion protein that interacts to a desired extent with all of the other 
fusion proteins. 

198. The method of any one of claims 191 or 194-196, wherein host cells are isolated 
10 which comprise a third fusion protein that interacts with one of the fusions proteins to a 

greater extent than it interacts with the other fusion proteins. 

199. The method of any one of claims 191 or 194-196, wherein host cells are isolated 
which comprise a third fusion protein that interacts to a desired extent with a desired 

1 5 combination of at least two of the other fusion proteins. 

200. The method of any one of claims 191 or 194-196, which further comprises the step 
of identifying nucleic acids which encode fusion proteins resulting in a desired level of 
expression of a reporter gene. 

20 

201 . The method of claim 191, wherein the host cell is a eukaryotic cell. 

202. The method of claim 201, wherein the host cell is a yeast cell. 

25 203. The method of claim 191, wherein the host cell is a prokaryotic cell. 

204. The method of claun 203, wherein the host cell is selected from the group consisting 
of bacterial strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, 
Serratia, Streptococcus, Lactobacillus, Enterococcus and Shigella. 

30 
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5 205. A method for detecting an interaction between a test polypeptide and a DNA 
sequence, comprising 

i providing a population of host cells wherein each cell contams 

(a) a reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 

1 0 elements) for a DNA-binding domain, 

(b) a chimeric gene which encodes a fusion protein, the fusion protein 
including a test polypeptide and an activation tag, 

wherein expression of the reporter gene results in signal detectable by FACS; 

wherein interaction between a test polypeptide of a fusion protein and a DBD 
15 recognition element in a host cells results in a desired level of expression of the reporter 
gene; and 

ii isolating host cells comprising a fusion protein that interacts with a DBD 
recognition element based on a desired level of expression of the reporter gene using FACS 
thereby detecting an mteraction between the test polypeptide and the DBD recognition 

20 element DNA sequence. 



206. The method of claim 205, wherein the activation tag is an RNA polymerase, an 
RNA polymerase subunit, a functional fragment of an RNA polymerase, or a fimctional 
fragment of an KNA polymerase subunit 

25 

207. The method of claim 205, wherein the activation tag is an RNA polymerase, an 
RNA polymerase subunit, a functional fragment of an RNA polymerase, a functional 
fragment of an RNA polymerase subunit, a molecule covalently fixsed to RNA polymerase, 
a molecule covalently fused to an RNA polymerase subunit, a molecule covalently fused to 

30 a functional fragment of RNA polymerase, or a molecule covalently fused to a functional 
fragment of an RNA polymerase subunit. 
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208. The method of claim 205, wherein the activation tag interacts indirectly with RNA 
polymerase via at least one intermediary polypeptide, nucleic acid, or small molecule, 
which functionally links the activation tag to the KNA polymerase. 

10 209. The method of claim 207, wherein the activation tag is a fragment of Gal 1 IP, and 
wherein the activation tag interacts with a fusion between Gal4 and the a subunit of RNA 
polymerase. 

210. The method of claim 205, which further comprises the step of isolating the nucleic 
15 acid which encodes the test polypeptides. 

211. The method of claim 205, wherein DBD recognition element, the fusion protein, or 
the DBD recognition element and the fusion protein are members of a library. 

20 212. The method of claim 21 1, wherein the DBD recognition element is part of a library 
of at least 10^ members, the fusion protein is part of a library of at least lO' members, or tiie 
DBD recognition element and the fusion protein are both members of a library such that at 
least 10^ unique pairs of a DBD recognition element and a fusion protein could be tested for 
interaction. 

25 

213. The method of claim 211, wherein the DBD recognition element is part of a library 
of at least 10^ members, the fusion protein is part of a library of at least 10^ members, or the 
DBD recognition element and the fusion protein are both members of a library such that at 
least 10^ unique pairs of a DBD recognition element and a fusion protein could be tested for 
30 interaction. 
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5 214. The method of claim 205, wherein the host cell is a eukaryotic cell. 

215. The method of claim 214, wherein the host cell is a yeast cell. 

216. The method of claim 205, wherein the host cell is a prokaryotic cell, 

10 

217. The method of claim 216, wherein the host cell is selected from the group consisting 
of bacterial "Strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, 
Seiratia, Streptococcus, Lactobacillus, Enterococcus and shigella. 

15 218. The method of claim 205, wherein the host cells comprising a fusion protein that 
interacts with a DBD recognition element are isolated based on measuring a detectable 
signal conferred by a desired expression level of the reporter gene. 

219. The method of claim 218, wherein the detectable signal is selected from the group 
20 consisting of color, fluorescence, luminescence and a cell surface tag. 

220. The method of claim 205, wherem the DBD recognition element is a member of a 
library of binding sites for a DNA binding domain and host cells comprising a DBD 
recognition element bound by the polypeptide are isolated. 

25 

221. The method of claim 220, wherein the polypeptide is a zinc finger protein. 

222. The method of claim 205, wherein the DBD recognition element is a desired 
binding site for a DNA binding domain and the test polypeptide is a member of a library 
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5 and host cells comprising a polypeptide which binds to the DBD recognition element are 
isolated 

223, The method of claim 222, wherein the polypeptides are zinc finger proteins. 

10 224. The method of claim 205, wherein the DBD recognition element is a member of 
library of potential bindmg sites for a DNA binding domain and the test polypeptide is a 
member of a library of polypeptides and host cells comprising a polypeptide that binds a 
DBD recognition element are isolated. 

15 225. The method of claim 224, wherein the polyp^tides are zinc finger proteins. 

226. A polypeptide isolated by the method of any one of claims 205, 212, 213, 222 or 
224. 

20 227. The polypeptide of claim 226 which is a zinc finger protein. 

228. A binding site for a DNA binding domain isolated by tiie method of any one of 
claims 205, 212, 213, 220 or 224. 

25 229. The binding site for a DNA binding domain of claim 228 which binds a zinc finger 
protein. 

230. An interacting pair of a polypeptide and a binding site for a DNA binding domain 
isolated by the method of any one of claims 205, 212, 213, 220, 222 or 224. 



30 
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5 23 1 . The interacting pair of claim 230, wherein the polypeptide is a zinc finger protein. 

232. A method for selecting a polypeptide that differentially interacts with at least two 
different DNA sequences, comprising 

i providing a population of host cells each of which contains 

10 (a) a first reporter gene operably linked to a transcriptional regulatory 

sequence which includes one or more binding sites (DBD recognition 
elements) for a first DNA-binding domain, 

(b) a second reporter gene operably linked to a transcriptional regulatory 
sequence wliich includes one or more binding site (DBD recognition 

1 5 element) for a second DNA-binding domain, 

(c) a chimmc gene which encodes a fiision protein, the fiision protein 
including a test polypeptide and an activation tag, 

wherein expression of the first and second reporter genes results in a signal 
detectable by FACS; 

20 wherein interaction of a fusion protein with the first DBD recognition element in the 

host cells results in a desired level of expression of the first reporter gene; 

wherein interaction of a fiision protein with the second DBD recognition element in 
the host cells results in a desired level of expression of the second reporter gene; and 

ii isolating host cells comprising a fiision protein that interacts with the first 
25 DBD recognition element, the second DBD recognition element, or the first and second 

DBD recognition elements based on a desired level of expression of the first reporter gene, 
the second reporter gene, or the first and second reporter genes, respectively, using FACS, 
thereby selecting a polypeptide that differentially interacts with at least two different DNA 
sequences. 



30 
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5 233. The method of claim 232, wherein fiision proteins are selected that interact to a 
desired extent with at least three different DNA sequences operably linked to reporter 
genes. 

234. The method of claim 233, wherein fusion proteins are selected that interact to a 

10 desired extent with at least four different DNA sequences operably linked to reporter genes. 

235. The method of claim 232» which further comprises the step of isolatmg the nucleic 
acid which encodes the fusion protein. 

15 236. The method of claim 232, wherein each reporter gene may be detected 
independently, simultaneously, or independently and simultaneously. 

237. The method of claim 232, wherein the desired level of expression of at least one of 
the reporter genes is an increase in reporter gene expression as compared to the basal 

20 expression level of the reporter gene. 

238. The method of claim 232, wherein host cells are isolated which have one reporter 
gene whose level of expression is increased to a greater extent than the increase in the level 
of expression of the other reporter genes, relative to the basal expression level of the 

25 reporter genes. 

239. The method of claim 232, which further comprises the step of isolating the nucleic 
acid which encodes the test polypeptides. 

30 240. The method of claim 232, wherein the host cell is a eukaryotic cell. 
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5 



241 . The method of claim 240, wherein the host cell is a yeast cell. 



242. The method of claim 232, wherein the host cell is a prokaryotic cell. 



10 243. The method of claim 242, wherein the host cell is selected from the group consisting 
of bacterial strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, 
Serratia, Streptococcus, Lactobacillus, Enterococcus and shigella. 



244. A method for detecting an interaction between a test RNA binding domain 
15 polypeptide and an RNA sequence, comprising 



1 



providing a population of host cells wherein each cell contains 



(a) 



a reporter gene operably linked to a transcriptional regulatory 
sequence which includes one or more binding sites (DBD recognition 
elements) for a DNA-binding domain, 

a first chimeric gene which encodes a fusion protein, tifcie fusion 
protein including a DNA-binding domain and a first RNA binding 
domain. 



20 



(b) 



25 



(c) 



a second chimeric gene which encodes a fusion protein, the fusion 
pmtein including an activation tag and a second RNA binding 
domain, 



(d) 



a third chimeric gene which encodes a hybrid RNA, the hybrid RNA 
comprising a first RNA sequence that binds one of the first or second 
RNA binding domains and a second RNA sequence to be tested for 
interaction with the RNA-binding domain not bound to the first RNA 



30 



sequence; 



wherein the expression of the rq)orter gene produces a signal detectable by FACS; 
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5 wherein interaction of an RNA-binding domain not bound to the first RNA 

sequence with the second RNA sequence in a host cell results in a desired level of 
expression of the reporter gene; and 

ii isolating host cells comprising an RNA-binding domain that interacts with 
the second RNA sequence based on a desired level of expression of the reporter gene 
10 thereby detecting an interaction between a test RNA binding domain polypeptide and an 
RNA sequence using FACS. 

245« The method of claim 244, which further comprises the step of isolating the nucleic 
acid which encodes the test RNA-binding domain polypeptide or the nucleic acid which 
15 encodes the portion of the RNA sequence bound by the test RNA-binding domain 
polypeptide. 

246. The method of claim 244, wherein the host cell is a eukaryotic cell. 
20 247. The method of claim 246, wherein the host cell is a yeast cell. 

248. The method of claim 244, wherein &e host cell is a prokaryotic cell. 

249. The method of claim 248, wherein the host cell is selected from the group consisting 
25 of bacterial strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, 

Smatia, Strq)tococcus, Lactobacillxis, Enterococcus and shigella. 

250. A kit for selecting a polypeptide that interacts with a test polypeptide, comprising: 

i a first gene construct for encoding a first fiision protein, which first gene 
30 construct comprises: 



wo 01/88197 



PCT/USOl/15718 



-181- 

(a) transcriptional and translational elements which direct e}q)ression of 
a protein in a host cell, 

(b) a DNA sequence that encodes a DNA binding domain and which is 
operably linked with the transcriptional and translational elements of 
the first gene construct, and 

(c) one or more sites for inserting a DNA sequence encoding a first test 
polypeptide into the first gene construct in such a manner that the 
first test polypeptide is expressed in-frame as part of a fiision protein 
containing the DNA binding domain; 

ii a second gene construct for encoding a second fiision protein, which second 
gene construct comprises: 

(a) transcriptional and translational elements which direct expression of 
a protein in a host cell, 

(b) a DNA sequence that encodes an activation tag and which is 
operably linked with the transcriptional and translational elements of 
the second gene construct, and 

(c) one or more sites for inserting a DNA sequence encoding a second 
test polypeptide into the second gene construct in such a manner that 
the second test polypeptide is expressed in-fi:ame as part of a fiision 
protein containing the activation tag; 

iii a host cell containing at least one reporter gene having one or more bmding 
sites (DBD recognition elements) for the DNA binding domain; 

wherein expression of the reporter gene produces a signal detectable by FACS; and 

wherein a desired level of expression of the reporter gene is obtained upon 
interaction of the first and second fiision proteins and can by analyzed usuig FACS. 

25 1 . The kit of claim 250, wherein the host cell is a eukaryotic cell. 
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252. The kit of claim 25 1 , wherein the host cell is a yeast cell. 

253. The kit of claim 250, wherein the host cell is a prokaryotic cell. 

10 254. The kit of claim 253, wherein the host cell is selected from the group consisting of 
bacterial strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, Serratia, 
Streptococcus, Lactobacillus, Enterococcus and shigella. 

255. The kit of claim 250, wherein the first, second, or first and second fiision proteins 
15 are a member of a library. 

256. A test polypeptide isolated using the kit of claim 250. 

257. A kit for detecting an interaction between a test DNA-binding domain polypeptide 
20 and a DNA sequence, comprising: 



1 



a first gene construct which comprises: 



(a) 



one or more sites for inserting a DNA sequence comprising a 
transcriptional element which includes at least one binding site (DBD 
recognition element) for a DNA-binding domain. 



25 



(b) 



a translational element operably linked to the transcriptional element, 
and 



(c) 



a DNA sequence for at least one reporter gene which is operably 
linked with the transcriptional and translational elements of the first 
gene construct, and 
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5 wherein the transcriptional and translational elements direct expression of 

the reporter gene in a host cell; 

ii a second gene construct for encoding a first fusion protein, which second 
gene construct comprises: 

(a) transcriptional and translational elements which direct expression of 
10 a protein in a host cell, 

(b) a DNA sequence that encodes an activation tag and which is 
operably linked with the transcriptional and translational elements of 
the second gene construct, and 

(c) one or more sites for insertiag a DNA sequence encoding a first test 
15 polypeptide into the second gene construct in such a manner that the 

first test polypeptide is expressed in-fi:ame as part of a fusion protein 
containing the activation tag; 

iii a host cell; 

wherein expression of the reporter gene produces a signal detectable by FACS; and 

20 wherein a desired level of expression of the reporter gene is obtained upon 

interaction of a test polypeptide with a DBD recognition element and can by analyzed by 
FACS. 



258. The kit of claim 257, wherein the activation tag is an RNA polymerase, an RNA 
25 polymerase subunit, a functionai fi:agment of an RNA polymerase, or a functional fragment 
of an RNA polymerase subunit. 



259. The kit of claim 257, wherein the activation tag is an RNA polymerase, an RNA 
polymerase subunit, a functional fragment of an RNA polymerase, a functional fragment of 
30 an RNA polymerase subunit, a molecule covalently fused to RNA polymerase, a molecule 
covalently fused to an RNA polymerase subunit, a molecule covalently fused to a 
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5 functional fragment of RNA polymerase, or a molecule covalently fused to a functional 
fragment of an RNA polymerase subunit. 

260. The kit of claim 257, wherein the activation tag interacts indirectly with RNA 
polymerase via at least one intermediary polypeptide, nucleic acid, or small molecule, 

10 which functionally links the activation tag to the RNA polymerase. 

261 . The kit of claim 259, wherein the activation tag is a fragment of Gal IIP, and 
wherein the activation tag interacts with a fiision between Gal4 and the a subunit of RNA 
polymerase. 

15 

262. The kit of claim 257, wherein the host cell is a eukaryotic ceU. 

263. The kit of claim 262, wherein the host cell is a yeast cell. 

20 264. The kit of claim 257, wherein the host cell is a prokaryotic cell. 

265. The kit of claim 264, wherein the host cell is selected from the group consisting of 
bactmal strains of Escherichia, Bacillus, Streptomyces, Pseudomonas, Salmonella, Serratia, 
Streptococcus, Lactobacillus, Enterococcus and shigella. 

25 

266. The kit of claim 264, wherein the first, second, or first and second fusion proteins 
are a member of a library. 

267. A test DNA-binding domain polypeptide isolated using the kit of claim 257. 

30 
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5 268. The DNA-binding domain polypeptide of claim 267, which is a zinc finger protein. 

269. A binding site for a DNA binding domain isolated using the kit of claim 257. 

270. The binding site for a DNA binding domain of claim 269 that binds a zinc finger 
10 protein. 

271 . An interacting pair of a polypeptide and a binding site for a DNA binding domain 
isolated using the kit of claim 257. 

15 272. The interacting pair of claim 271, wherein the polypeptide is a zinc finger protein. 
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P53ZF IN VITRO SITE SELECTION CONSENSUS SEQUENCE: 

CXGGA CACGT X 
(WHERE X = NO CLEAR PREFERENCE) 



IN VIVO SITE SELECTION LIBRARY 

CGGGA NNNNN G 
(WHERE N = A MIXTURE OF A, G. C, AND T) 



SELECTED CLONES: 

SEQUENCE # OF CLONES 

CGGGA CACGTG 9 
CGGG ACATGTG 5 
CGGGACACGGG 2 



SEQUENCE FOLD ACTIVATION 

CGGGA CACGT G 1 3.6 ± 2.7 

CGGG ACATGTG ^2^0 ± 0.5 

CGGGA CACGG G 1 2!6 ± 1 ^9 



FIG. 9 



